Intel(R) Math Kernel Library Reference Manual

Intel® Math Kernel Library

Reference Manual

World Wide Web: http://developer.intel.com

March 2007

Disclaimer and Legal Information

Document Number: 630813-025US

DateVersion InformationVersion

11/94Original Issue.-001

5/95Added functions crotg, zrotg. Documented versions of functions ?her2k, ?symm, ?syrk, and?syr2k not previously described. Pagination revised.

-002

1/96Changed the title; former title: “Intel BLAS Library for the Pentium® Processor ReferenceManual.”

-003

Added functions ?rotm, ?rotmg and updated Appendix C.

11/96Documents Intel® Math Kernel library (Intel® MKL) release 2.0 with the parallelism capability.Information on parallelism has been added in Chapter 1 and in section “BLAS Level 3 Routines”in Chapter 2.

-004

8/97Two-dimensional FFTs have been added. C interface has been added to both one- andtwo-dimensional FFTs.

-005

1/98Documents Intel Math Kernel Library release 2.1. Sparse BLAS section has been added inChapter 2.

-006

1/99Documents Intel Math Kernel Library release 3.0. Descriptions of LAPACK routines (Chapters4 and 5) and CBLAS interface (Appendix C) have been added. Quick Reference has beenexcluded from the manual; MKL 3.0 Quick Reference is now available in HTML format.

-007

6/99Documents Intel Math Kernel Library release 3.2. Description of FFT routines have beenrevised. In Chapters 4 and 5 NAG names for LAPACK routines have been excluded.

-008

11/99New LAPACK routines for eigenvalue problems have been added in chapter 5.-009

06/00Documents Intel Math Kernel Library release 4.0. Chapter 6 describing the VML functions hasbeen added.

-010

04/01Documents Intel Math Kernel Library release 5.1. LAPACK section has been extended to includethe full list of computational and driver routines.

-011

07/02Documents Intel Math Kernel Library release 6.0 beta. New DFT interface and Vector StatisticalLibrary functions have been added.

-6001

12/02Documents Intel Math Kernel Library 6.0 beta update. DFT functions description has beenupdated. CBLAS interface description was extended.

-6002

03/03Documents Intel Math Kernel Library 6.0 gold. DFT functions have been updated. AuxiliaryLAPACK routines’ descriptions were added to the manual.

-6003

07/03Documents Intel Math Kernel Library release 6.1.-6004

11/03Documents Intel Math Kernel Library release 7.0 beta. Includes ScaLAPACK and sparse solverdescriptions.

-6005

04/04Documents Intel MKL and Intel® Cluster MKL release 7.0 gold. Auxiliary ScaLAPACK andalternative sparse solver interface were added.

-017

3

DateVersion InformationVersion

03/05Documents Intel MKL and Intel® Cluster MKL release 8.0 beta. Sparse BLAS and DFTIsections were extended. New functionality was added: Sparse BLAS, Cluster DFTI, iterativesparse solver, multiple-precision arithmetic, interval linear solver, andconvolution/correlation. Fortran95 interface to LAPACK functions was added.

-018

08/05Documents Intel MKL and Intel® Cluster MKL release 8.0 gold. Fortran95 interface toBLAS and Sparse BLAS functions has been added.

-019

03/06Documents Intel MKL and Intel Cluster MKL release 8.0.2. PARDISO functionalitydescription has been extended with indefinite symmetric matrices pivoting.

-020

03/06Documents Intel MKL and Intel Cluster MKL release 8.1 gold. Chapter 13 on TrigonometricTransform functions has been added. Information on specific features of Fortran-95implementation for LAPACK routines has been reflected in a new Appendix E and therelevant subsection of Chapter 3.

-021

05/06Documents Intel MKL and Intel Cluster MKL release 9.0 beta. Statistical Functions, FourierTransform Functions (Cluster DFT) descriptions have been updated. Description of newVML functions, RCI Sparse Solvers, and Poisson solver have been added. Chapter 13 hasbeen renamed to PDE Support. Code examples have been expanded.

-022

09/06Documents Intel MKL and Intel Cluster MKL release 9.0 gold. Complex Interval Solvershave been added. Description of the old deprecated FFT functions have been removed.

-023

01/07Documents Intel MKL and Intel Cluster MKL release 9.1 beta. Optimization SolversRoutines and Support Functions chapters have been added. Chapters on Sparse Solversand Partial Differential Equations Support have been extended. LAPACK chapters havebeen partially updated to reflect LAPACK 3.1 version.

-024

03/07Documents Intel MKL and Intel Cluster MKL release 9.1 gold. BLACS chapter has beenadded. Chapters on BLAS and FFT have been extended. LAPACK chapters have beenadditionally updated to reflect LAPACK 3.1 version. Description of BSR format has beenadded to Appendix A. New FFT examples have been added to Appendix C.

-025

4

Intel® Math Kernel Library Reference Manual

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE,EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTEDBY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY,RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TOFITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHTOR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, lifesustaining, critical control or safety systems, or in nuclear facility applications.

Intel may make changes to specifications and product descriptions at any time, without notice.

The software described in this document may contain software defects which may cause the product to deviatefrom published specifications. Current characterized software defects are available on request.

This document as well as the software described in it is furnished under license and may only be used or copiedin accordance with the terms of the license. The information in this manual is furnished for informational useonly, is subject to change without notice, and should not be construed as a commitment by Intel Corporation.Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in thisdocument or any software that may be provided in association with this document.

Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system,or transmitted in any form or by any means without the express written consent of Intel Corporation.

Developers must not rely on the absence or characteristics of any features or instructions marked "reserved" or"undefined." Improper use of reserved or undefined features or instructions may cause unpredictable behavioror failure in developer’s software code when running on an Intel processor. Intel reserves these features orinstructions for future definition and shall have no responsibility whatsoever for conflicts or incompatibilitiesarising from their unauthorized use.

BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino logo, Core Inside, FlashFile, i960, InstantIP, Intel, Intellogo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Core, Intel Inside, Intel Inside logo, Intel.Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, IntelSpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, IPLink, Itanium, Itanium Inside, MCS, MMX,Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, VTune, Xeon,and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.

* Other names and brands may be claimed as the property of others.

Copyright © 1994-2007, Intel Corporation.

Portions © Copyright 2001 Hewlett-Packard Development Company, L.P.

Chapters 4 and 5 include derivative work portions that have been copyrighted:

© 1991, 1992, and 1998 by The Numerical Algorithms Group, Ltd.

5

Contents

Chapter 1: OverviewAbout This Software..........................................................................33

Technical Support.....................................................................34BLAS Routines..........................................................................34Sparse BLAS Routines...............................................................35LAPACK Routines......................................................................35ScaLAPACK Routines.................................................................36Sparse Solver Routines..............................................................36VML Functions..........................................................................36Statistical Functions..................................................................36Fourier Transform Functions.......................................................37Interval Solver Routines.............................................................37Partial Differential Equations Support...........................................37Optimization Solvers Routines....................................................37Support Functions.....................................................................38BLACS Routines........................................................................38GMP Arithmetic Functions..........................................................38Performance Enhancements.......................................................38Parallelism...............................................................................39Platforms Supported..................................................................39

About This Manual.............................................................................40

7

Audience for This Manual.....................................................40Manual Organization...........................................................40Notational Conventions........................................................42

Chapter 2: BLAS and Sparse BLAS RoutinesRoutine Naming Conventions........................................................45Fortran-95 Interface Conventions..................................................47Matrix Storage Schemes..............................................................48BLAS Level 1 Routines and Functions.............................................49

?asum...............................................................................50?axpy...............................................................................51?copy................................................................................53?dot..................................................................................54?sdot................................................................................56?dotc................................................................................57?dotu................................................................................59?nrm2...............................................................................60?rot..................................................................................61?rotg................................................................................63?rotm...............................................................................64?rotmg..............................................................................67?scal.................................................................................69?swap...............................................................................71i?amax..............................................................................72i?amin..............................................................................73dcabs1..............................................................................75

BLAS Level 2 Routines.................................................................75?gbmv..............................................................................77?gemv..............................................................................81?ger..................................................................................84?gerc................................................................................86?geru................................................................................88

8


?hbmv..............................................................................90?hemv..............................................................................93?her..................................................................................96?her2................................................................................98?hpmv.............................................................................101?hpr................................................................................103?hpr2..............................................................................105?sbmv.............................................................................108?spmv.............................................................................111?spr................................................................................114?spr2..............................................................................116?symv.............................................................................118?syr................................................................................121?syr2..............................................................................123?tbmv.............................................................................125?tbsv..............................................................................129?tpmv.............................................................................133?tpsv..............................................................................136?trmv..............................................................................138?trsv...............................................................................141

BLAS Level 3 Routines...............................................................143?gemm............................................................................145?hemm............................................................................149?herk..............................................................................152?her2k............................................................................155?symm............................................................................159?syrk..............................................................................162?syr2k.............................................................................166?trmm.............................................................................170?trsm..............................................................................173

Sparse BLAS Level 1 Routines and Functions.................................176Vector Arguments.............................................................177

9

Contents

Naming Conventions.........................................................177Routines and Data Types....................................................177BLAS Level 1 Routines That Can Work With Sparse Vectors.....178?axpyi.............................................................................179?doti...............................................................................181?dotci..............................................................................182?dotui.............................................................................184?gthr...............................................................................185?gthrz.............................................................................187?roti................................................................................188?sctr...............................................................................190

Sparse BLAS Level 2 and Level 3.................................................191Naming Conventions in Sparse BLAS Level 2 and Level 3.......191Sparse Matrix Data Structures............................................193Routines and Supported Operations.....................................193Interface Consideration.....................................................195Sparse BLAS Level 2 and Level 3 Routines............................200

Chapter 3: LAPACK Routines: Linear EquationsRoutine Naming Conventions......................................................288Fortran-95 Interface Conventions................................................289

MKL Fortran-95 Interfaces for LAPACK Routines vs. NetlibImplementation............................................................291

Matrix Storage Schemes............................................................292Mathematical Notation...............................................................293Error Analysis...........................................................................293Computational Routines.............................................................294

Routines for Matrix Factorization.........................................297Routines for Solving Systems of Linear Equations..................325Routines for Estimating the Condition Number......................362Refining the Solution and Estimating Its Error.......................396Routines for Matrix Inversion..............................................439

10


Routines for Matrix Equilibration.........................................457Driver Routines.........................................................................470

?gesv..............................................................................471?gesvx............................................................................475?gbsv..............................................................................481?gbsvx............................................................................484?gtsv..............................................................................491?gtsvx.............................................................................493?posv..............................................................................498?posvx............................................................................500?ppsv..............................................................................505?ppsvx............................................................................508?pbsv..............................................................................513?pbsvx............................................................................516?ptsv..............................................................................521?ptsvx.............................................................................523?sysv..............................................................................527?sysvx.............................................................................530?hesv..............................................................................535?hesvx............................................................................538?spsv..............................................................................543?spsvx............................................................................545?hpsv..............................................................................550?hpsvx............................................................................552

Chapter 4: LAPACK Routines: Least Squares and EigenvalueProblems

Routine Naming Conventions......................................................558Matrix Storage Schemes............................................................560Mathematical Notation...............................................................560Computational Routines.............................................................561

Orthogonal Factorizations..................................................561

11

Contents

Singular Value Decomposition.............................................643Symmetric Eigenvalue Problems.........................................675Generalized Symmetric-Definite Eigenvalue Problems............756Nonsymmetric Eigenvalue Problems....................................774Generalized Nonsymmetric Eigenvalue Problems...................834Generalized Singular Value Decomposition...........................878

Driver Routines.........................................................................891Linear Least Squares (LLS) Problems...................................892Generalized LLS Problems..................................................909Symmetric Eigenproblems..................................................917Nonsymmetric Eigenproblems...........................................1015Singular Value Decomposition...........................................1040Generalized Symmetric Definite Eigenproblems...................1057Generalized Nonsymmetric Eigenproblems..........................1136

Chapter 5: LAPACK Auxiliary and Utility RoutinesAuxiliary Routines....................................................................1169

?lacgv...........................................................................1184?lacrm...........................................................................1185?lacrt............................................................................1186?laesy...........................................................................1187?rot...............................................................................1189?spmv...........................................................................1190?spr..............................................................................1192?symv...........................................................................1194?syr..............................................................................1196i?max1..........................................................................1197?sum1...........................................................................1198?gbtf2...........................................................................1199?gebd2..........................................................................1201?gehd2..........................................................................1203?gelq2...........................................................................1205

12


?geql2...........................................................................1207?geqr2...........................................................................1209?gerq2...........................................................................1210?gesc2...........................................................................1212?getc2...........................................................................1213?getf2...........................................................................1215?gtts2............................................................................1216?isnan...........................................................................1218?laisnan.........................................................................1218?labrd............................................................................1219?lacn2...........................................................................1222?lacon...........................................................................1224?lacpy...........................................................................1225?ladiv............................................................................1227?lae2.............................................................................1228?laebz...........................................................................1229?laed0...........................................................................1234?laed1...........................................................................1237?laed2...........................................................................1239?laed3...........................................................................1242?laed4...........................................................................1244?laed5...........................................................................1246?laed6...........................................................................1247?laed7...........................................................................1249?laed8...........................................................................1253?laed9...........................................................................1256?laeda...........................................................................1258?laein............................................................................1260?laev2...........................................................................1263?laexc...........................................................................1265?lag2.............................................................................1267?lags2...........................................................................1269

13

Contents

?lagtf............................................................................1271?lagtm...........................................................................1273?lagts............................................................................1275?lagv2...........................................................................1277?lahqr............................................................................1280?lahrd............................................................................1283?lahr2............................................................................1286?laic1............................................................................1289?laln2............................................................................1291?lals0............................................................................1295?lalsa............................................................................1299?lalsd............................................................................1303?lamrg...........................................................................1306?laneg...........................................................................1307?langb...........................................................................1308?lange...........................................................................1310?langt............................................................................1311?lanhs...........................................................................1313?lansb...........................................................................1314?lanhb...........................................................................1316?lansp...........................................................................1318?lanhp...........................................................................1320?lanst/?lanht..................................................................1322?lansy...........................................................................1323?lanhe...........................................................................1325?lantb............................................................................1327?lantp............................................................................1329?lantr............................................................................1331?lanv2...........................................................................1333?lapll.............................................................................1334?lapmt...........................................................................1335?lapy2...........................................................................1337

14


?lapy3...........................................................................1337?laqgb...........................................................................1338?laqge...........................................................................1340?laqhb...........................................................................1342?laqp2...........................................................................1344?laqps...........................................................................1346?laqr0............................................................................1348?laqr1............................................................................1352?laqr2............................................................................1354?laqr3............................................................................1358?laqr4............................................................................1362?laqr5............................................................................1366?laqsb...........................................................................1370?laqsp...........................................................................1372?laqsy...........................................................................1374?laqtr............................................................................1375?lar1v............................................................................1378?lar2v............................................................................1381?larf..............................................................................1383?larfb............................................................................1385?larfg............................................................................1387?larft.............................................................................1389?larfx............................................................................1392?largv............................................................................1393?larnv............................................................................1395?larra............................................................................1396?larrb............................................................................1398?larrc............................................................................1400?larrd............................................................................1402?larre............................................................................1406?larrf.............................................................................1410?larrj.............................................................................1413

15

Contents

?larrk............................................................................1415?larrr.............................................................................1416?larrv............................................................................1418?lartg............................................................................1422?lartv............................................................................1424?laruv............................................................................1425?larz..............................................................................1426?larzb............................................................................1428?larzt............................................................................1431?las2.............................................................................1434?lascl.............................................................................1436?lasd0...........................................................................1437?lasd1...........................................................................1439?lasd2...........................................................................1443?lasd3...........................................................................1447?lasd4...........................................................................1450?lasd5...........................................................................1452?lasd6...........................................................................1453?lasd7...........................................................................1458?lasd8...........................................................................1462?lasd9...........................................................................1464?lasda...........................................................................1467?lasdq...........................................................................1471?lasdt............................................................................1474?laset............................................................................1475?lasq1...........................................................................1476?lasq2...........................................................................1477?lasq3...........................................................................1479?lasq4...........................................................................1480?lasq5...........................................................................1481?lasq6...........................................................................1483?lasr..............................................................................1484

16


?lasrt............................................................................1488?lassq............................................................................1489?lasv2...........................................................................1491?laswp...........................................................................1492?lasy2...........................................................................1494?lasyf............................................................................1496?lahef............................................................................1498?latbs............................................................................1501?latdf............................................................................1503?latps............................................................................1506?latrd............................................................................1508?latrs............................................................................1512?latrz............................................................................1516?lauu2...........................................................................1519?lauum..........................................................................1520?lazq3...........................................................................1521?lazq4...........................................................................1524?org2l/?ung2l.................................................................1526?org2r/?ung2r................................................................1527?orgl2/?ungl2.................................................................1529?orgr2/?ungr2................................................................1531?orm2l/?unm2l...............................................................1532?orm2r/?unm2r..............................................................1535?orml2/?unml2...............................................................1537?ormr2/?unmr2..............................................................1540?ormr3/?unmr3..............................................................1542?pbtf2...........................................................................1545?potf2...........................................................................1547?ptts2............................................................................1548?rscl..............................................................................1550?sygs2/?hegs2................................................................1551?sytd2/?hetd2................................................................1553

17

Contents

?sytf2............................................................................1555?hetf2...........................................................................1557?tgex2...........................................................................1559?tgsy2...........................................................................1562?trti2.............................................................................1566clag2z...........................................................................1568dlag2s...........................................................................1569slag2d...........................................................................1570zlag2c...........................................................................1571

Utility Functions and Routines...................................................1572ilaver.............................................................................1573ilaenv............................................................................1573iparmq..........................................................................1576ieeeck...........................................................................1578lsame............................................................................1579lsamen..........................................................................1580?labad...........................................................................1581?lamch..........................................................................1582?lamc1..........................................................................1583?lamc2..........................................................................1584?lamc3..........................................................................1585?lamc4..........................................................................1585?lamc5..........................................................................1586second/dsecnd................................................................1587xerbla............................................................................1588

Chapter 6: ScaLAPACK RoutinesOverview...............................................................................1589Routine Naming Conventions....................................................1590Computational Routines...........................................................1592

Linear Equations.............................................................1592Routines for Matrix Factorization.......................................1593

18


Routines for Solving Systems of Linear Equations................1611Routines for Estimating the Condition Number....................1632Refining the Solution and Estimating Its Error.....................1642Routines for Matrix Inversion............................................1655Routines for Matrix Equilibration........................................1661Orthogonal Factorizations.................................................1667Symmetric Eigenproblems................................................1748Nonsymmetric Eigenvalue Problems...................................1775Singular Value Decomposition...........................................1789Generalized Symmetric-Definite Eigen Problems..................1805

Driver Routines.......................................................................1810p?gesv..........................................................................1811p?gesvx.........................................................................1813p?gbsv..........................................................................1820p?dbsv..........................................................................1823p?dtsv...........................................................................1826p?posv..........................................................................1829p?posvx.........................................................................1831p?pbsv..........................................................................1838p?ptsv...........................................................................1841p?gels...........................................................................1844p?syev...........................................................................1849p?syevx.........................................................................1852p?heevx.........................................................................1860p?gesvd.........................................................................1869p?sygvx.........................................................................1874p?hegvx.........................................................................1883

Chapter 7: ScaLAPACK Auxiliary and Utility RoutinesAuxiliary Routines....................................................................1895

p?lacgv..........................................................................1902p?max1.........................................................................1903

19

Contents

?combamax1..................................................................1904p?sum1.........................................................................1905p?dbtrsv........................................................................1906p?dttrsv.........................................................................1910p?gebd2........................................................................1914p?gehd2........................................................................1918p?gelq2.........................................................................1922p?geql2.........................................................................1924p?geqr2.........................................................................1927p?gerq2.........................................................................1930p?getf2..........................................................................1933p?labrd..........................................................................1935p?lacon..........................................................................1940p?laconsb......................................................................1942p?lacp2..........................................................................1943p?lacp3..........................................................................1945p?lacpy..........................................................................1947p?laevswp......................................................................1949p?lahrd..........................................................................1951p?laiect..........................................................................1954p?lange.........................................................................1956p?lanhs..........................................................................1958p?lansy, p?lanhe.............................................................1961p?lantr..........................................................................1964p?lapiv..........................................................................1967p?laqge.........................................................................1971p?laqsy..........................................................................1973p?lared1d......................................................................1976p?lared2d......................................................................1977p?larf............................................................................1979p?larfb...........................................................................1983p?larfc...........................................................................1988

20


p?larfg...........................................................................1992p?larft...........................................................................1994p?larz............................................................................1998p?larzb..........................................................................2002p?larzc..........................................................................2007p?larzt...........................................................................2011p?lascl...........................................................................2016p?laset..........................................................................2018p?lasmsub.....................................................................2020p?lassq..........................................................................2021p?laswp.........................................................................2023p?latra...........................................................................2025p?latrd..........................................................................2026p?latrs...........................................................................2031p?latrz...........................................................................2034p?lauu2.........................................................................2037p?lauum........................................................................2039p?lawil...........................................................................2040p?org2l/p?ung2l..............................................................2041p?org2r/p?ung2r.............................................................2044p?orgl2/p?ungl2..............................................................2047p?orgr2/p?ungr2.............................................................2050p?orm2l/p?unm2l............................................................2053p?orm2r/p?unm2r...........................................................2057p?orml2/p?unml2............................................................2062p?ormr2/p?unmr2...........................................................2066p?pbtrsv........................................................................2071p?pttrsv.........................................................................2076p?potf2..........................................................................2080p?rscl............................................................................2082p?sygs2/p?hegs2............................................................2083p?sytd2/p?hetd2.............................................................2086

21

Contents

p?trti2...........................................................................2090?lamsh..........................................................................2092?laref............................................................................2094?lasorte.........................................................................2096?lasrt2...........................................................................2097?stein2..........................................................................2098?dbtf2...........................................................................2101?dbtrf............................................................................2103?dttrf.............................................................................2105?dttrsv..........................................................................2106?pttrsv..........................................................................2108?steqr2..........................................................................2110

Utility Functions and Routines...................................................2112p?labad.........................................................................2112p?lachkieee....................................................................2113p?lamch.........................................................................2114p?lasnbt........................................................................2116pxerbla..........................................................................2117

Chapter 8: Sparse Solver RoutinesPARDISO - Parallel Direct Sparse Solver Interface........................2119

pardiso..........................................................................2120Direct Sparse Solver (DSS) Interface Routines............................2136

DSS Interface Description................................................2138dss_create.....................................................................2139dss_define_structure.......................................................2140dss_reorder....................................................................2141dss_factor_real, dss_factor_complex.................................2142dss_solve_real, dss_solve_complex...................................2143dss_delete.....................................................................2145dss_statistics..................................................................2145mkl_cvt_to_null_terminated_str........................................2149

22


Implementation Details....................................................2149Iterative Sparse Solvers based on Reverse Communication Interface

(RCI ISS)...........................................................................2151CG Interface Description..................................................2156FGMRES Interface Description...........................................2162dcg_init.........................................................................2171dcg_check......................................................................2172dcg...............................................................................2173dcg_get.........................................................................2175dcgmrhs_init..................................................................2176dcgmrhs_check...............................................................2178dcgmrhs........................................................................2179dcgmrhs_get..................................................................2182dfgmres_init...................................................................2183dfgmres_check...............................................................2184dfgmres.........................................................................2185dfgmres_get...................................................................2188Implementation Details....................................................2189

Preconditioners or Accelerators based on Incomplete LU FactorizationTechnique...........................................................................2190

ILU0 Preconditioner Interface Description...........................2193dcsrilu0.........................................................................2194

Calling Sparse Solver Routines From C/C++...............................2198

Chapter 9: Vector Mathematical FunctionsData Types and Accuracy Modes................................................2201Function Naming Conventions...................................................2202

Functions Interface.........................................................2203Vector Indexing Methods..........................................................2205Error Diagnostics.....................................................................2206VML Mathematical Functions.....................................................2207

Inv................................................................................2208

23

Contents

Div................................................................................2210Sqrt..............................................................................2211InvSqrt..........................................................................2213Cbrt..............................................................................2214InvCbrt..........................................................................2215Pow...............................................................................2216Powx.............................................................................2218Hypot............................................................................2221Exp...............................................................................2222Ln.................................................................................2224Log10............................................................................2226Cos...............................................................................2227Sin................................................................................2229SinCos...........................................................................2231Tan...............................................................................2232Acos..............................................................................2234Asin..............................................................................2235Atan..............................................................................2237Atan2............................................................................2239Cosh.............................................................................2240Sinh..............................................................................2242Tanh..............................................................................2244Acosh............................................................................2245Asinh.............................................................................2247Atanh............................................................................2249Erf................................................................................2250Erfc...............................................................................2252ErfInv............................................................................2253Floor.............................................................................2255Ceil...............................................................................2256Trunc.............................................................................2257Round...........................................................................2258

24


NearbyInt......................................................................2259Rint...............................................................................2261Modf.............................................................................2262

VML Pack/Unpack Functions......................................................2263Pack..............................................................................2264Unpack..........................................................................2266

VML Service Functions.............................................................2269SetMode........................................................................2269GetMode........................................................................2272SetErrStatus...................................................................2273GetErrStatus..................................................................2275ClearErrStatus................................................................2275SetErrorCallBack.............................................................2276GetErrorCallBack.............................................................2279ClearErrorCallBack..........................................................2280

Chapter 10: Statistical FunctionsRandom Number Generators.....................................................2281

Conventions...................................................................2282Basic Generators.............................................................2289Error Reporting...............................................................2293Service Routines.............................................................2294Distribution Generators....................................................2323Advanced Service Routines...............................................2385

Convolution and Correlation......................................................2394Overview.......................................................................2394Naming Conventions........................................................2396Data Types.....................................................................2397Parameters....................................................................2397Task Status and Error Reporting........................................2400Task Constructors...........................................................2402Task Editors...................................................................2414

25

Contents

Task Execution Routines...................................................2421Task Destructors.............................................................2433Task Copy......................................................................2435Usage Examples.............................................................2436Mathematical Notation and Definitions...............................2440Data Allocation...............................................................2441

Chapter 11: Fourier Transform FunctionsDFT Functions.........................................................................2445

Computing DFT...............................................................2446DFT Interface.................................................................2447Status Checking Functions................................................2448Descriptor Manipulation Functions.....................................2452DFT Computation Functions..............................................2458Descriptor Configuration Functions....................................2464Configuration Settings.....................................................2471

Cluster DFT Functions..............................................................2499Computing Cluster DFT....................................................2501Distributing Data among Processes....................................2502Cluster DFT Interface......................................................2505Descriptor Manipulation Functions.....................................2506DFT Computation Functions..............................................2511Descriptor Configuration Functions....................................2517Error Codes....................................................................2525

Chapter 12: Interval Linear SolversRoutine Naming Conventions....................................................2528Routines for Fast Solution of Interval Systems.............................2529

?trtrs.............................................................................2529?gegas..........................................................................2531?gehss...........................................................................2533

26


?gekws..........................................................................2535?gegss...........................................................................2537?gehbs..........................................................................2539

Routines for Sharp Solution of Interval Systems..........................2540?gepps..........................................................................2540?gepss...........................................................................2543

Routines for Inverting Interval Matrices......................................2547?trtri.............................................................................2547?geszi............................................................................2548

Routines for Checking Properties of Interval Matrices...................2549?gerbr...........................................................................2549?gesvr...........................................................................2551

Auxiliary and Utility Routines....................................................2553?gemip..........................................................................2553

Chapter 13: Partial Differential Equations SupportTrigonometric Transform Routines..............................................2557

Transforms Implemented.................................................2557Sequence of Invoking TT Routines.....................................2559Interface Description.......................................................2561TT Routines....................................................................2561Common Parameters.......................................................2571Implementation Details....................................................2575

Poisson Library Routines ..........................................................2579Poisson Library Implemented............................................2580Sequence of Invoking PL Routines.....................................2587Interface Description.......................................................2591PL Routines for the Cartesian Solver..................................2592PL Routines for the Spherical Solver..................................2608Common Parameters.......................................................2617Implementation Details....................................................2626

Calling PDE Support Routines from Fortran-90.............................2639

27

Contents

Chapter 14: Optimization Solvers RoutinesOrganization and Implementation..............................................2643Nonlinear Least-Squares Problem without Constraints..................2645

dtrnlsp_init....................................................................2646dtrnlsp_solve..................................................................2647dtrnlsp_get....................................................................2649dtrnlsp_delete................................................................2651Examples of dtrnlsp Usage...............................................2652

Nonlinear Least-Squares Problem with Linear (Bound) Constraints..2666dtrnlspbc_init.................................................................2667dtrnlspbc_solve..............................................................2668dtrnlspbc_get.................................................................2670dtrnlspbc_delete.............................................................2671Examples of dtrnlspbc Usage............................................2672

Jacobi Matrix Calculation Routines.............................................2688djacobi_init....................................................................2689djacobi_solve.................................................................2690djacobi_delete................................................................2691djacobi..........................................................................2691Examples of djacobi_solve Usage......................................2693Examples of djacobi Usage...............................................2700

Chapter 15: Support FunctionsVersion Information Functions...................................................2706

MKLGetVersion...............................................................2706MKLGetVersionString.......................................................2709

Error Handling Functions..........................................................2710xerbla............................................................................2710pxerbla..........................................................................2711

Equality Test Functions.............................................................2712

28


lsame............................................................................2712lsamen..........................................................................2712

Timing Functions.....................................................................2713second/dsecnd................................................................2713getcpuclocks..................................................................2714getcpufrequency.............................................................2714setcpufrequency.............................................................2715

Memory Functions...................................................................2715MKL_FreeBuffers.............................................................2715

Chapter 16: BLACS RoutinesInitialization Routines..............................................................2719

blacs_pinfo....................................................................2720blacs_setup....................................................................2720blacs_get.......................................................................2721blacs_set.......................................................................2723blacs_gridinit..................................................................2724blacs_gridmap................................................................2725

Destruction Routines................................................................2727blacs_freebuff.................................................................2727blacs_gridexit.................................................................2728blacs_abort....................................................................2728blacs_exit......................................................................2729

Informational Routines.............................................................2730blacs_gridinfo.................................................................2730blacs_pnum...................................................................2731blacs_pcoord..................................................................2731

Miscellaneous Routines............................................................2732blacs_barrier..................................................................2732

Examples of BLACS Routines Usage...........................................2734

29

Contents

Appendix A: Linear Solvers BasicsSparse Linear Systems.............................................................2743

Matrix Fundamentals.......................................................2744Direct Method.................................................................2745Sparse Matrix Storage Formats.........................................2751

Interval Linear Systems...........................................................2766Intervals........................................................................2766Interval vectors and matrices...........................................2767Preconditioning...............................................................2772Inverting interval matrices...............................................2773

Appendix B: Routine and Function ArgumentsVector Arguments in BLAS........................................................2775Vector Arguments in VML.........................................................2776Matrix Arguments....................................................................2777

Appendix C: Code ExamplesBLAS Code Examples...............................................................2783PARDISO Code Examples.........................................................2790

Examples for Sparse Symmetric Linear Systems..................2790Examples for Sparse Unsymmetric Linear Systems..............2802

Direct Sparse Solver Code Examples..........................................2816Iterative Sparse Solver Code Examples......................................2832Fourier Transform Functions Code Examples................................2865

DFT Code Examples.........................................................2866Examples for Cluster DFT Functions...................................2891

Interval Linear Solvers Code Examples.......................................2894PDE Support Code Examples.....................................................2905

Trigonometric Transforms Interface Code Examples.............2906Poisson Library Code Examples.........................................2926

30


Appendix D: CBLAS Interface to the BLASCBLAS Arguments...................................................................2963Level 1 CBLAS........................................................................2964Level 2 CBLAS........................................................................2968Level 3 CBLAS........................................................................2977Sparse CBLAS.........................................................................2981

Appendix E: Specific Features of Fortran-95 Interfaces forLAPACK Routines

Interfaces Identical to Netlib.....................................................2984Interfaces with Replaced Argument Names.................................2987Modified Netlib Interfaces.........................................................2989Interfaces Absent From Netlib...................................................2991Interfaces of New Functionality.................................................2997

Appendix F: Optimization Solvers BasicsNonlinear Least Square Problem................................................2999Trust Region Algorithm.............................................................3000

Bibliography

Glossary

Index

31

Contents

32


1Overview

The Intel® Math Kernel Library (Intel® MKL) provides Fortran routines and functions that perform a widevariety of operations on vectors and matrices including sparse matrices and interval matrices. The libraryalso includes discrete Fourier transform routines, as well as vector mathematical and vector statisticalfunctions with Fortran and C interfaces.

The version of the library named Intel® Cluster MKL is a superset of Intel MKL and includes also ScaLAPACKsoftware and Cluster DFT software for solving respective computational problems on distributed-memoryparallel computers.

The Intel MKL enhances performance of the application programs that use it because the library has beenoptimized for latest generations of Intel® processors.

This chapter introduces the Intel Math Kernel Library and provides information about the organization ofthis manual.

About This SoftwareThe Intel® Math Kernel Library includes the following groups of routines:

• Basic Linear Algebra Subprograms (BLAS):

– vector operations

– matrix-vector operations

– matrix-matrix operations

• Sparse BLAS Level 1, 2, and 3 (basic operations on sparse vectors and matrices)

• LAPACK routines for solving systems of linear equations

• LAPACK routines for solving least-squares problems, eigenvalue and singular value problems,and Sylvester's equations

• Auxiliary and utility LAPACK routines

• ScaLAPACK computational, driver and auxiliary routines (for Intel Cluster MKL only)

• Direct and Iterative Sparse Solver routines

• Vector Mathematical Library (VML) functions for computing core mathematical functions on vectorarguments (with Fortran and C interfaces)

• Vector Statistical Library (VSL): functions for generating vectors of pseudorandom numbers withdifferent types of statistical distributions and for performing convolution and correlationcomputations

33

• General Discrete Fourier Transform Functions (DFT), providing fast computation of DFT viathe Fast Fourier Transform (FFT) algorithms and having Fortran and C interfaces

• Cluster DFT functions (for Intel Cluster MKL only)

• Interval Solver routines for solving systems of interval linear equations

• Tools for solving partial differential equations - trigonometric transform routines and Poissonsolver

• Optimization Solver routines for solving nonlinear least squares problems through theTrust-Region (TR) algorithms and computing Jacobi matrix by central differences

• GMP arithmetic functions.

For specific issues on using the library, please refer to the MKL Release Notes.

Technical Support

Intel MKL provides a product web site that offers timely and comprehensive product information,including product features, white papers, and technical articles. For the latest information,check: http://developer.intel.com/software/products/

Intel also provides a support web site that contains a rich repository of self help information,including getting started tips, known product issues, product errata, license information, userforums, and more (visit http://support.intel.com/support/performancetools/libraries/mkl).

Registering your product entitles you to one year of technical support and product updatesthrough Intel® Premier Support. Intel Premier Support is an interactive issue management andcommunication web site providing these services:

• Submit issues and review their status.

• Download product updates anytime of the day.

To register your product, contact Intel, or seek product support, please visit:http://developer.intel.com/software/products/support.

BLAS Routines

BLAS routines and functions are divided into the following groups according to the operationsthey perform:

• BLAS Level 1 Routines and Functions perform operations of both addition and reduction onvectors of data. Typical operations include scaling and dot products.

• BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication,rank-1 and rank-2 matrix updates, and solution of triangular systems.

34

1 Intel® Math Kernel Library Reference Manual

• BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication,rank-k update, and solution of triangular systems.

Starting from release 8.0, Intel® MKL also supports Fortran-95 interface to BLAS routines.

Sparse BLAS Routines

Sparse BLAS Level 1 Routines and Functions and Sparse BLAS Level 2 Routines and Level 3routines and functions operate on sparse vectors and matrices. These routines perform vectoroperations similar to BLAS Level 1, 2, and 3 routines. Sparse BLAS routines take advantage ofvector and matrix sparsity: they allow you to store only non-zero elements of vectors andmatrices. Intel MKL also supports Fortran-95 interface to Sparse BLAS routines.

LAPACK Routines

The Intel® Math Kernel Library fully supports LAPACK 3.0 set of computational, driver, auxiliaryand utility routines and partially supports LAPACK 3.1 routines - one computational routine(?stemr) and a number of auxiliary and utility routines.

The original versions of LAPACK from which that part of Intel MKL was derived can be obtainedfrom http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai,C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling,A. McKenney, and D. Sorensen.

The LAPACK routines can be divided into the following groups according to the operations theyperform:

• Routines for solving systems of linear equations, factoring and inverting matrices, andestimating condition numbers (see Chapter 3).

• Routines for solving least-squares problems, eigenvalue and singular value problems, andSylvester's equations (see Chapter 4 ).

• Auxiliary and utility routines used to perform certain subtasks, common low-level computationor related tasks (see Chapter 5 ).

Starting from release 8.0, Intel MKL also supports Fortran-95 interface to LAPACK computationaland driver routines. This interface provides an opportunity for simplified calls of LAPACK routineswith fewer required arguments.

35

Overview 1

ScaLAPACK Routines

ScaLAPACK package (included with Intel® Cluster MKL only, see Chapter 6 and Chapter 7 )runs on distributed-memory architectures and includes routines for solving systems of linearequations, solving linear least-squares problems, eigenvalue and singular value problems, aswell as performing a number of related computational tasks.

The original versions of ScaLAPACK from which that part of Intel Cluster MKL was derived canbe obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK areL. Blackford, J. Choi, A.Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling,G. Henry, A. Petitet, K.Stanley, D. Walker, and R. Whaley.

Intel Cluster MKL version of ScaLAPACK is optimized for Intel® processors and uses MPICHversion of MPI as well as Imtel MPI.

Sparse Solver Routines

Direct sparse solver routines in Intel MKL (see Chapter 8 ) solve symmetric andsymmetrically-structured sparse matrices with real or complex coefficients. For symmetricmatrices, these Intel MKL subroutines can solve both positive definite and indefinite systems.Intel MKL includes the PARDISO* sparse solver interface as well as an alternative set of usercallable direct sparse solver routines.

If you use the sparse solver PARDISO* from Intel MKL, please cite:

O.Schenk and K.Gärtner. Solving unsymmetric sparse systems of linear equations with PARDISO.J. of Future Generation Computer Systems, 20(3):475- 487, 2004.

Intel MKL provides also an iterative sparse solver (see Chapter 8 ) that uses sparse BLAS level2 and 3 routines and works with different sparse data formats.

VML Functions

Vector Mathematical Library (VML) functions (see Chapter 9 ) include a set of highly optimizedimplementations of certain computationally expensive core mathematical functions (power,trigonometric, exponential, hyperbolic etc.) that operate on vectors of real and complex numbers.

Application programs that might significantly improve performance with VML include nonlinearprogramming software, integrals computation, and many others. VML provides interfaces bothfor FORTRAN and C languages.

Statistical Functions

Vector Statistical Library (VSL) contains two sets of functions (see Chapter 10 ):

36


• The first set includes a collection of pseudo- and quasi-random number generator subroutinesimplementing basic continuous and discrete distributions. To provide best performance, VSLsubroutines use calls to highly optimized Basic Random Number Generators and a libraryof vector mathematical functions.

• The second set includes a collection of routines that implement a wide variety of convolutionand correlation operations.

Fourier Transform Functions

The Intel® MKL multidimensional Discrete Fourier Transform functions with mixed radix support(see Chapter 11 ) provide uniformity of DFT computation and combine functionality with easeof use. Both Fortran and C interface specification are given. There is also a cluster version ofDFT functions, which runs on distributed-memory architectures and is provided with Intel ClusterMKL package.

DFT functions provide fast computation via the Fast Fourier Transform (FFT) algorithms notonly for lengths that are powers of 2 but also for 3, 5, 7, 11, and other radices.

Interval Solver Routines

Interval Solver routines included into Intel® MKL (see Chapter 12 ) can be used to solve intervalsystems of linear equations and related problems.

Partial Differential Equations Support

Intel® MKL provides tools for solving Partial Differential Equations (PDE) (see Chapter 13 ).These tools are Trigonometric Transform interface routines and Poisson Library.

The Trigonometric Transform routines may be helpful to users who implement their own solverssimilar to the solver that the Poisson Library provides. The users can improve performance oftheir solvers by using fast sine, cosine, and staggered cosine transforms implemented in theTrigonometric Transform interface.

Poisson Library is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems.The Trigonometric Transform interface, which underlies the solver, is based on Intel MKL DFTinterface (refer to Chapter 11 ), optimized for Intel® processors.

Optimization Solvers Routines

Intel® MKL provides Optimization Solvers routines (see Chapter 14 ) that can be used to solvenonlinear least squares problems with or without linear (bound) constraints through theTrust-Region (TR) algorithms and compute Jacobi matrix by central differences.

37

Overview 1

See also Appendix A Optimization Solvers Basics for description of the basic notions ofoptimization solvers, nonlinear least square problems, and the TR algorithm.

Support Functions

Intel® MKL support functions (see Chapter 15 ) are used to support the operation of Intel MKLsoftware and provide basic information on the library and library operation, such as the currentlibrary version, timing, setting and measuring of CPU frequency, error handling, and memoryallocation.

BLACS Routines

Intel® Math Kernel Library implements routines from the BLACS (Basic Linear AlgebraCommunication Subprograms) package (see Chapter 16 ) that are used to support a linearalgebra oriented message passing interface that may be implemented efficiently and uniformlyacross a large range of distributed memory platforms.

The original versions of BLACS from which that part of Intel MKL was derived can be obtainedfrom http://www.netlib.org/blacs/index.html. The authors of BLACS are Jack Dongarra and R.Clint Whaley.

GMP Arithmetic Functions

Intel® MKL implementation of GMP arithmetic functions includes arbitrary precision arithmeticoperations on integer numbers. The interfaces of such functions fully match the GNU MultiplePrecision (GMP) Arithmetic Library.

Performance Enhancements

The Intel® Math Kernel Library has been optimized by exploiting both processor and systemfeatures and capabilities. Special care has been given to those routines that most profit fromcache-management techniques. These especially include matrix-matrix operation routines suchas dgemm().

In addition, code optimization techniques have been applied to minimize dependencies ofscheduling integer and floating-point units on the results within the processor.

The major optimization techniques used throughout the library include:

• Loop unrolling to minimize loop management costs.

• Blocking of data to improve data reuse opportunities.

• Copying to reduce chances of data eviction from cache.

38


• Data prefetching to help hide memory latency.

• Multiple simultaneous operations (for example, dot products in dgemm) to eliminate stallsdue to arithmetic unit pipelines.

• Use of hardware features such as the SIMD arithmetic units, where appropriate.

These are techniques from which the arithmetic code benefits the most.

Parallelism

In addition to the performance enhancements discussed above, the Intel® MKL offers performancegains through parallelism provided by the symmetric multiprocessing performance (SMP)feature. You can obtain improvements from SMP in the following ways:

• One way is based on user-managed threads in the program and further distribution of theoperations over the threads based on data decomposition, domain decomposition, controldecomposition, or some other parallelizing technique. Each thread can use any of the IntelMKL functions because the library has been designed to be thread-safe.

• Another method is to use the FFT and BLAS level 3 routines. They have been parallelizedand require no alterations of your application to gain the performance enhancements ofmultiprocessing. Performance using multiple processors on the level 3 BLAS shows excellentscaling. Since the threads are called and managed within the library, the application doesnot need to be recompiled thread-safe (see also Fortran-95 Interface Conventions in Chapter2 ).

• Yet another method is to use tuned LAPACK routines. Currently these include the single-and double precision flavors of routines for QR factorization of general matrices, triangularfactorization of general and symmetric positive-definite matrices, solving systems of equationswith such matrices, as well as solving symmetric eigenvalue problems.

For instructions on setting the number of available processors for the BLAS level 3 and LAPACKroutines, see the Intel® MKL User's Guide.

Platforms Supported

The Intel® Math Kernel Library includes Fortran routines and functions optimized for Intel®processor-based computers running operating systems that support multiprocessing. In additionto the Fortran interface, the Intel MKL includes a C-language interface for the Discrete Fouriertransform functions, as well as for the Vector Mathematical Library and Vector Statistical Libraryfunctions. For hardware and software requirements to use Intel MKL, see MKL Release Notes.

39

Overview 1

About This ManualThis manual describes the routines and functions of the Intel® MKL and Intel® Cluster MKL. Eachreference section describes a routine group typically consisting of routines used with four basicdata types: single-precision real, double-precision real, single-precision complex, anddouble-precision complex.

Each routine group is introduced by its name, a short description of its purpose, and the callingsequence, or syntax, for each type of data with which each routine of the group is used. Thefollowing sections are also included:

Describes the operation performed by routines of the group based onone or more equations. The data types of the arguments are definedin general terms for the group.

Description

Defines the data type for each parameter on entry, for example:Input ParametersREAL for saxpyaDOUBLE PRECISION for daxpy

Lists resultant parameters on exit.Output Parameters

Audience for This Manual

The manual addresses programmers proficient in computational mathematics and assumes aworking knowledge of the principles and vocabulary of linear algebra, mathematical statistics,and Fourier transforms.

Manual Organization

The manual contains the following chapters and appendixes:

Overview. Introduces the Intel Math Kernel Library software, providesinformation on manual organization, and explains notationalconventions.

Chapter 1

BLAS and Sparse BLAS Routines. Provides descriptions of BLAS andSparse BLAS functions and routines.

Chapter 2

LAPACK Routines: Linear Equations. Provides descriptions of LAPACKroutines for solving systems of linear equations and performing anumber of related computational tasks: triangular factorization, matrixinversion, estimating the condition number of matrices.

Chapter 3

40


LAPACK Routines: Least Squares and Eigenvalue Problems. Providesdescriptions of LAPACK routines for solving least-squares problems,standard and generalized eigenvalue problems, singular value problems,and Sylvester's equations.

Chapter 4

LAPACK Auxiliary and Utility Routines. Describes auxiliary and utilityLAPACK routines that perform certain subtasks or common low-levelcomputation.

Chapter 5

ScaLAPACK Routines. Describes ScaLAPACK computational and driverroutines (software included with Intel Cluster MKL only).

Chapter 6

ScaLAPACK Auxiliary and Utility Routines. Describes ScaLAPACK auxiliaryroutines (software included with Intel Cluster MKL only).

Chapter 7

Sparse Solver Routines. Describes direct sparse solver routines thatsolve symmetric and symmetrically-structured sparse matrices. Alsodescribes the iterative sparse solver routines.

Chapter 8

Vector Mathematical Functions. Provides descriptions of VML functionsfor computing elementary mathematical functions on vector arguments.

Chapter 9

Statistical Functions. Provides descriptions of VSL functions forgenerating vectors of pseudorandom numbers and for performingconvolution and correlation operations.

Chapter 10

Fourier Transform Functions. Describes multidimensional functions forcomputing the Discrete Fourier Transform and cluster DFT functions(software included with Intel Cluster MKL only).

Chapter 11

Interval Linear Solvers. Describes routines that can be used to solveinterval systems of linear equations and related problems.

Chapter 12

Partial Differential Equations Support. Describes TrigonometricTransform interface routines and Poisson Library implemented forsolving Partial Differential Equations (PDE).

Chapter 13

Optimization Solvers Routines. Describes routines for solvingoptimization problems. These routines solve nonlinear least squaresproblems through the Trust-Region (TR) algorithms and compute Jacobimatrix by central differences.

Chapter 14

Support Functions. Describes functions that are used to support theoperation of Intel MKL software, such as status information functions,timing and error handling functions.

Chapter 15

BLACS Routines. Describes Basic Linear Algebra CommunicationSubprograms that are used to support a linear algebra oriented messagepassing interface.

Chapter 16

Linear Solvers Basics. Briefly describes the basic definitions andapproaches used in linear algebra for solving systems of linearequations. Describes sparse data storage formats, as well as basicconcepts of interval arithmetic.

Appendix A

41

Overview 1

Routine and Function Arguments. Describes the major arguments ofthe BLAS routines and VML functions: vector and matrix arguments.

Appendix B

Code Examples. Provides code examples of calling various Intel MKLfunctions and routines (BLAS, PARDISO, Direct and Iterative SparseSolver, DFT, Cluster DFT, Intreval Linear Solvers, TrigonometricTransforms, and Poisson Library).

Appendix C

CBLAS Interface to the BLAS. Provides the C interface to the BLAS.Appendix DSpecific Features of Fortran-95 Interfaces for LAPACK Routines. Providesthe features of Intel MKL Fortran-95 interfaces for LAPACK routines incomparison with Netlib implementation.

Appendix E

Optimization Solvers Basics. Briefly describes the basic notions ofoptimization solvers, nonlinear least square problem, and Trust Regionalgorithm.

Appendix F

The manual also includes a Bibliography, Glossary and an Index.

Notational Conventions

This manual uses the following notational conventions:

• Routine name shorthand (?ungqr instead of cungqr/zungqr).

• Font conventions used for distinction between the text and the code.

Routine Name Shorthand

For shorthand, character codes are represented by a question mark “?” names of routine groups.The question mark is used to indicate any or all possible varieties of a function; for example:

Refers to all four data types of the vector-vector ?swap routine: sswap,dswap, cswap, and zswap.

?swap

Font Conventions

The following font conventions are used:

Data type used in the discussion of input and output parameters forFortran interface. For example, CHARACTER*1.

UPPERCASE COURIER

Code examples:lowercase couriera(k+i,j) = matrix(i,j)and data types for C interface, for example, const float*

42


Function names for C interface, for example, vmlSetModelowercase courier

mixed with UpperCase

courier

Variables in arguments and parameters discussion. For example, incx.lowercase courier

italic

Used as a multiplication symbol in code examples and equations andwhere required by the Fortran syntax.

*

43

Overview 1

2BLAS and Sparse BLAS Routines

This chapter contains descriptions of the BLAS and Sparse BLAS routines of the Intel® Math Kernel Library.The routine descriptions are arranged in several sections according to the BLAS level of operation:

• BLAS Level 1 Routines and Functions (vector-vector operations)

• BLAS Level 2 Routines (matrix-vector operations)

• BLAS Level 3 Routines (matrix-matrix operations)

• Sparse BLAS Level 1 Routines and Functions (vector-vector operations).

• Sparse BLAS Level 2 and Level 3 (matrix-vector and matrix-matrix operations)

Each section presents the routine and function group descriptions in alphabetical order by routine orfunction group name; for example, the ?asum group, the ?axpy group. The question mark in the groupname corresponds to different character codes indicating the data type (s, d, c, and z or their combination);see Routine Naming Conventions.

When BLAS or Sparse BLAS routines encounter an error, they call the error reporting routine xerbla. Tobe able to view error reports, you must include xerbla in your code. A copy of the source code for xerblais included in the library.

In BLAS Level 1 groups i?amax and i?amin, an “i” is placed before the character code and correspondsto the index of an element in the vector. These groups are placed in the end of the BLAS Level 1 section.

Routine Naming ConventionsBLAS routine names have the following structure:

<character code> <name> <mod> ( )

The <character code> is a character that indicates the data type:

real, single precisions

complex, single precisionc

real, double precisiond

complex, double precisionz

Some routines and functions can have combined character codes, such as sc or dz.

For example, the function scasum uses a complex input array and returns a real value.

45

The <name> field, in BLAS level 1, indicates the operation type. For example, the BLAS level 1routines ?dot, ?rot, ?swap compute a vector dot product, vector rotation, and vector swap,respectively.

In BLAS level 2 and 3, <name> reflects the matrix argument type:

general matrixge

general band matrixgb

symmetric matrixsy

symmetric matrix (packed storage)sp

symmetric band matrixsb

Hermitian matrixhe

Hermitian matrix (packed storage)hp

Hermitian band matrixhb

triangular matrixtr

triangular matrix (packed storage)tp

triangular band matrix.tb

The <mod> field, if present, provides additional details of the operation. BLAS level 1 namescan have the following characters in the <mod> field:

conjugated vectorc

unconjugated vectoru

Givens rotation.g

BLAS level 2 names can have the following characters in the <mod> field:

matrix-vector productmv

solving a system of linear equations with matrix-vector operationssv

rank-1 update of a matrixr

rank-2 update of a matrix.r2

BLAS level 3 names can have the following characters in the <mod> field:

matrix-matrix productmm

solving a system of linear equations with matrix-matrix operationssm

rank-k update of a matrixrk

rank-2k update of a matrix.r2k

The examples below illustrate how to interpret BLAS routine names:

46

<d> <dot>: double-precision real vector-vector dot productddot

<c> <dot> <c>: complex vector-vector dot product, conjugatedcdotc

<sc> <asum>: sum of magnitudes of vector elements, single precisionreal output and single precision complex input

scasum

<c> <dot> : vector-vector dot product, unconjugated, complexcdotu

<s> <ge> <mv>: matrix-vector product, general matrix, single precisionsgemv

<z> <tr> <mm>: matrix-matrix product, triangular matrix,double-precision complex.

ztrmm

Sparse BLAS naming conventions are similar to those of BLAS level 1. For more information,see “Naming Conventions”.

Fortran-95 Interface ConventionsFortran-95 interface to BLAS and Sparse BLAS Level 1 routines is implemented through wrappersthat call respective Fortran-77 routines. This interface uses such features of Fortran-95 asassumed-shape arrays and optional arguments to provide simplified calls to BLAS and SparseBLAS Level 1 routines with fewer arguments.

The main conventions that are used in Fortran-95 interface are as follows:

• The names of arguments used in Fortran-95 call are typically the same as for the respectivegeneric (Fortran-77) interface. However, to reduce the number of argument names used inthe library, the following identity settings of formal argument names were made:

Fortran-95Argument Name

Generic ArgumentName

aap

Note that these name changes of formal arguments have no impact on program semanticsand follow the conventions of unification names.

• Input arguments such as array dimensions are not required in Fortran-95 and are skippedfrom the calling sequence. Array dimensions are reconstructed from the user data that mustexactly follow the required array shape.

Also, an argument can be skipped if its value is completely defined by the presence orabsence of another argument in the calling sequence, and the restored value is the onlymeaningful value for the skipped argument.

47

BLAS and Sparse BLAS Routines 2

• Arguments incx and incy are skipped. In all cases their values are assumed to be 1. Onecan obtain the effect of the values of incx and incy not being equal to 1 by usingcorresponding Fortran-95 feature: index incrementing may be directly established in actualarguments. Other possibility to obtain this effect is to use Fortran-77 call.

• Some generic arguments are declared as optional in Fortran-95 interface and may or maynot be present in the calling sequence. An argument can be declared optional if it satisfiesone of the following conditions:

1. If an input argument can take only a few possible values, it can be declared as optional.The default value of such argument is typically set as the first value in the list and allexceptions to this rule are explicitly stated in the routine description.

2. If an input argument has a natural default value, it can be declared as optional. Thedefault value of such optional argument is set to its natural default value.

• Optional arguments are given in square brackets in Fortran-95 call syntax.

The concrete rules used for reconstructing the values of omitted optional parameters are specificfor each routine and are detailed in the respective “Fortran-95 Notes” subsection given at theend of routine specification section. If this subsection is omitted, the Fortran-95 interface forthe given routine does not differ from the corresponding Fortran-77 interface.

Note that this interface is not implemented in the current version of Sparse BLAS Level 2 andLevel 3 routines. Fortran-95 interfaces for each these routines is given in the “Interfaces -Fortran-95” subsection at the end of the respective routine specification section.

Matrix Storage SchemesMatrix arguments of BLAS routines can use the following storage schemes:

• Full storage: a matrix A is stored in a two-dimensional array a, with the matrix element aij

stored in the array element a(i,j).

• Packed storage scheme allows you to store symmetric, Hermitian, or triangular matricesmore compactly: the upper or lower triangle of the matrix is packed by columns in aone-dimensional array.

• Band storage: a band matrix is stored compactly in a two-dimensional array: columns ofthe matrix are stored in the corresponding columns of the array, and diagonals of the matrixare stored in rows of the array.

For more information on matrix storage schemes, see “Matrix Arguments” in Appendix B.

48


BLAS Level 1 Routines and FunctionsBLAS Level 1 includes routines and functions, which perform vector-vector operations. Table2-1 lists the BLAS Level 1 routine and function groups and the data types associated withthem.

Table 2-1 BLAS Level 1 Routine Groups and Their Data Types

DescriptionData TypesRoutine orFunction Group

Sum of vector magnitudes (functions)s, d, sc, dz?asum

Scalar-vector product (routines)s, d, c, z?axpy

Copy vector (routines)s, d, c, z?copy

Dot product (functions)s, d?dot

Dot product with extended precision (functions)sd, d?sdot

Dot product conjugated (functions)c, z?dotc

Dot product unconjugated (functions)c, z?dotu

Vector 2-norm (Euclidean norm) a normal or nullvector(functions)

s, d, sc, dz?nrm2

Plane rotation of points (routines)s, d, cs, zd?rot

Givens rotation of points (routines)s, d, c, z?rotg

Modified plane rotation of pointss, d?rotm

Givens modified plane rotation of pointss, d?rotmg

Vector scaling (routines)s, d, c, z, cs, zd?scal

Vector-vector swap (routines)s, d, c, z?swap

Vector maximum value, absolute largest element ofa vector, where i is an index to this value in thevector array (functions)

s, d, c, zi?amax

49


DescriptionData TypesRoutine orFunction Group

Vector minimum value, absolute smallest elementof a vector, where i is an index to this value in thevector array (functions)

s, d, c, zi?amin

Absolute value of a double complex number z.ddcabs1

?asumComputes the sum of magnitudes of the vectorelements.

Syntax

Fortran 77:

res = sasum(n, x, incx )

res = scasum(n, x, incx )

res = dasum(n, x, incx )

res = dzasum(n, x, incx )

Fortran 95:

res = asum(x)

Description

The function ?asum function computes the sum of the magnitudes of elements of a real vector,or the sum of magnitudes of the real and imaginary parts of elements of a complex vector:

res = |Rex (1)| + |Imx (1)| + |Rex(2) | + |Imx (2)|+ ... + |Rex (n)| +|Imx (n)|

where x is a vector of order n.

Input Parameters

INTEGER. Specifies the order of vector x.n

REAL for sasumxDOUBLE PRECISION for dasum

50


COMPLEX for scasumDOUBLE COMPLEX for dzasum

Array, DIMENSION at least (1 + (n-1)*abs(incx)).

INTEGER. Specifies the increment for the elements of x.incx

Output Parameters

REAL for sasumresDOUBLE PRECISION for dasumREAL for scasumDOUBLE PRECISION for dzasumContains the sum of magnitudes of real and imaginary partsof all elements of the vector.

Fortran 95 Interface Notes

Routines in Fortran 95 interface have fewer arguments in the calling sequence than theirFortran77 counterparts. For general conventions applied to skip redundant or reconstructiblearguments, see Fortran-95 Interface Conventions.

Specific details for the routine asum interface are the following:

Holds the array of size(n).x

?axpyComputes a vector-scalar product and adds theresult to a vector.

Syntax

Fortran 77:

call saxpy(n, a, x, incx, y, incy)

call daxpy(n, a, x, incx, y, incy)

call caxpy(n, a, x, incx, y, incy)

call zaxpy(n, a, x, incx, y, incy)

51


Fortran 95:

call axpy(x, y [,a])

Description

The ?axpy routines perform a vector-vector operation defined as

y := a*x + y

where:

a is a scalar

x and y are vectors of order n.

Input Parameters

INTEGER. Specifies the order of vectors x and y.n

REAL for saxpyaDOUBLE PRECISION for daxpyCOMPLEX for caxpyDOUBLE COMPLEX for zaxpySpecifies the scalar a.

REAL for saxpyxDOUBLE PRECISION for daxpyCOMPLEX for caxpyDOUBLE COMPLEX for zaxpyArray, DIMENSION at least (1 + (n-1)*abs(incx)).


REAL for saxpyyDOUBLE PRECISION for daxpyCOMPLEX for caxpyDOUBLE COMPLEX for zaxpyArray, DIMENSION at least (1 + (n-1)*abs(incy)).

INTEGER. Specifies the increment for the elements of y.incy

Output Parameters

Contains the updated vector y.y

52




Specific details for the routine axpy interface are the following:

Holds the array of size(n).x

Holds the array of size (n).y

The default value is 1.a

?copyCopies vector to another vector.

Syntax

Fortran 77:

call scopy(n, x, incx, y, incy)

call dcopy(n, x, incx, y, incy)

call ccopy(n, x, incx, y, incy)

call zcopy(n, x, incx, y, incy)

Fortran 95:

call copy(x, y)

Description

The ?copy routines perform a vector-vector operation defined as

y = x,

where x and y are vectors.

Input Parameters


REAL for scopyxDOUBLE PRECISION for dcopyCOMPLEX for ccopy

53


DOUBLE COMPLEX for zcopyArray, DIMENSION at least (1 + (n-1)*abs(incx)).


REAL for scopyyDOUBLE PRECISION for dcopyCOMPLEX for ccopyDOUBLE COMPLEX for zcopyArray, DIMENSION at least (1 + (n-1)*abs(incy)).


Output Parameters

Contains a copy of the vector x if n is positive. Otherwise,parameters are unaltered.

y



Specific details for the routine copy interface are the following:

Holds the vector of length(n).x

Holds the vector of length (n).y

?dotComputes a vector-vector dot product.

Syntax

Fortran 77:

res = sdot(n, x, incx, y, incy)

res = ddot(n, x, incx, y, incy)

Fortran 95:

res = dot(x, y)

54


Description

The ?dot functions perform a vector-vector reduction operation defined as

where x and y are vectors.

Input Parameters


REAL for sdotxDOUBLE PRECISION for ddotArray, DIMENSION at least (1+(n-1)*abs(incx)).


REAL for sdotyDOUBLE PRECISION for ddotArray, DIMENSION at least (1+(n-1)*abs(incy)).


Output Parameters

REAL for sdotresDOUBLE PRECISION for ddotContains the result of the dot product of x and y, if n ispositive. Otherwise, res contains 0.



Specific details for the routine dot interface are the following:



55


?sdotComputes a vector-vector dot product withextended precision.

Syntax

Fortran 77:

res = sdsdot(n, sb, sx, incx, s y, incy)

res = dsdot(n, s x, incx, s y, incy)

Fortran 95:

res = sdot(sx, sy)

res = sdot(sx, sy, sb)

Description

The ?sdot functions compute the inner product of two vectors with extended precision. Bothfunctions use extended precision accumulation of the intermediate results, but the functionsdsdot outputs the final result in single precision, whereas the function dsdot outputs thedouble precision result. The function sdsdot also adds scalar value sb to the inner product.

Input Parameters

INTEGER. Specifies the number of elements in the inputvectors sx and sy.

n

REAL. Single precision scalar to be added to inner product(for the function sdsdot only).

sb

REAL.sx, syArrays, DIMENSION at least (1+(n -1)*abs(incx)) and(1+(n-1)*abs(incy)), respectively. Contain the inputsingle precision vectors.

INTEGER. Specifies the increment for the elements of sx.incx

INTEGER. Specifies the increment for the elements of sy.incy

Output Parameters

REAL for sdsdotres

56


DOUBLE PRECISION for dsdotContains the result of the dot product of sx and sy (with sbadded for sdsdot), if n is positive. Otherwise, res containssb for sdsdot and 0 for dsdot.



Specific details for the routine sdot interface are the following:

Holds the vector of length(n).sx

Holds the vector of length(n).sy

NOTE. Note that scalar parameter sb is declared as a required parameter in Fortran-95interface for the function sdot to distinguish between function flavors that output finalresult in different precision.

?dotcComputes a dot product of a conjugated vectorwith another vector.

Syntax

Fortran 77:

res = cdotc(n, x, incx, y, incy )

res = zdotc(n, x, incx, y, incy )

Fortran 95:

res = dotc(x, y)

Description

The ?dotC functions perform a vector-vector operation defined as

57


where x and y are n-element vectors.

Input Parameters


COMPLEX for cdotcxDOUBLE COMPLEX for zdotcArray, DIMENSION at least (1 + (n -1)*abs(incx)).


COMPLEX for cdotcyDOUBLE COMPLEX for zdotcArray, DIMENSION at least (1 + (n -1)*abs(incy)).


Output Parameters

COMPLEX for cdotcresDOUBLE COMPLEX for zdotcContains the result of the dot product of the conjugated xand unconjugated y, if n is positive. Otherwise, res contains0.



Specific details for the routine dotc interface are the following:



58


?dotuComputes a vector-vector dot product.

Syntax

Fortran 77:

res = cdotu(n, x, incx, y, incy )

res = zdotu(n, x, incx, y, incy )

Fortran 95:

res = dotu(x, y)

Description

The ?dotu functions perform a vector-vector reduction operation defined as

where x and y are n-element complex vectors.

Input Parameters


COMPLEX for cdotuxDOUBLE COMPLEX for zdotuArray, DIMENSION at least (1 + (n -1)*abs(incx)).


COMPLEX for cdotuyDOUBLE COMPLEX for zdotuArray, DIMENSION at least (1 + (n -1)*abs(incy)).


Output Parameters

COMPLEX for cdoturesDOUBLE COMPLEX for zdotuContains the result of the dot product of x and y, if n ispositive. Otherwise, res contains 0.

59




Specific details for the routine dotu interface are the following:



?nrm2Computes the Euclidean norm of a vector.

Syntax

Fortran 77:

res = snrm2(n, x, incx )

res = dnrm2(n, x, incx )

res = scnrm2(n, x, incx )

res = dznrm2(n, x, incx )

Fortran 95:

res = nrm2(x)

Description

The ?nrm2 functions perform a vector reduction operation defined as

res = | |x| |,

where:

x is a vector

res is a value containing the Euclidean norm of the elements of x.

Input Parameters


REAL for snrm2x

60


DOUBLE PRECISION for dnrm2COMPLEX for scnrm2DOUBLE COMPLEX for dznrm2Array, DIMENSION at least (1 + (n -1)*abs (incx)).


Output Parameters

REAL for snrm2resDOUBLE PRECISION for dnrm2REAL for scnrm2DOUBLE PRECISION for dznrm2Contains the Euclidean norm of the vector x.



Specific details for the routine nrm2 interface are the following:


?rotPerforms rotation of points in the plane.

Syntax

Fortran 77:

call srot(n, x, incx, y, incy, c, s )

call drot(n, x, incx, y, incy, c, s )

call csrot(n, x, incx, y, incy, c, s )

call zdrot(n, x, incx, y, incy, c, s )

Fortran 95:

call rot(x, y [,c] [,s])

61


Description

Given two complex vectors x and y, each vector element of these vectors is replaced as follows:

x(i) = c*x(i) + s*y(i)

y(i) = c*y(i) - s*x(i)

Input Parameters


REAL for srotxDOUBLE PRECISION for drotCOMPLEX for csrotDOUBLE COMPLEX for zdrotArray, DIMENSION at least (1 + (n -1)*abs(incx)).


REAL for srotyDOUBLE PRECISION for drotCOMPLEX for csrotDOUBLE COMPLEX for zdrotArray, DIMENSION at least (1 + (n -1)*abs(incy)).


REAL for srotcDOUBLE PRECISION for drotREAL for csrotDOUBLE PRECISION for zdrotA scalar.

REAL for srotsDOUBLE PRECISION for drotREAL for csrotDOUBLE PRECISION for zdrotA scalar.

Output Parameters

Each element is replaced by c*x + s*y.x

Each element is replaced by c*y - s*x.y

62




Specific details for the routine rot interface are the following:



The default value is 1.c

The default value is 1.s

?rotgComputes the parameters for a Givens rotation.

Syntax

Fortran 77:

call srotg(a, b, c, s )

call drotg(a, b, c, s )

call crotg(a, b, c, s )

call zrotg(a, b, c, s )

Fortran 95:

call rotg(a, b, c, s)

Description

Given the cartesian coordinates (a, b) of a point p, these routines return the parameters a,b, c, and s associated with the Givens rotation that zeros the y-coordinate of the point.

See a more accurate LAPACK version ?lartg.

Input Parameters

REAL for srotgaDOUBLE PRECISION for drotgCOMPLEX for crotg

63


DOUBLE COMPLEX for zrotgProvides the x-coordinate of the point p.

REAL for srotgbDOUBLE PRECISION for drotgCOMPLEX for crotgDOUBLE COMPLEX for zrotgProvides the y-coordinate of the point p.

Output Parameters

Contains the parameter r associated with the Givensrotation.

a

Contains the parameter z associated with the Givensrotation.

b

REAL for srotgcDOUBLE PRECISION for drotgREAL for crotgDOUBLE PRECISION for zrotgContains the parameter c associated with the Givensrotation.

REAL for srotgsDOUBLE PRECISION for drotgCOMPLEX for crotgDOUBLE COMPLEX for zrotgContains the parameter s associated with the Givensrotation.

?rotmPerforms rotation of points in the modified plane.

Syntax

Fortran 77:

call srotm(n, x, incx, y, incy, param )

call drotm(n, x, incx, y, incy, param )

64


Fortran 95:

call rotm(x, y [,param])

Description

Given two complex vectors x and y, each vector element of these vectors is replaced as follows:

x(i) = H*x(i) + H*y(i)

y(i) = H*y(i) - H*x(i)

where:

H is a modified Givens transformation matrix whose values are stored in the param(2) throughparam(5) array. See discussion on the param argument.

Input Parameters


REAL for srotmxDOUBLE PRECISION for drotmArray, DIMENSION at least (1 + (n -1)*abs(incx)).


REAL for srotmyDOUBLE PRECISION for drotmArray, DIMENSION at least (1 + (n -1)*abs(incy)).


REAL for srotmparamDOUBLE PRECISION for drotmArray, DIMENSION 5.The elements of the param array are:param(1) contains a switch, flag. param(2-5) contain h11,h21, h12, and h22, respectively, the components of the arrayH.Depending on the values of flag, the components of H areset as follows:

65


In the above cases, the matrix entries of 1., -1., and 0. areassumed based on the last three values of flag and are notactually loaded into the param vector.

Output Parameters

Each element is replaced by h11*x + h12*y.x

Each element is replaced by h21*x + h22*y.y

Givens transformation matrix updated.H



Specific details for the routine rotm interface are the following:

66




The default value for param(1) is -2.param

?rotmgComputes the modified parameters for a Givensrotation.

Syntax

Fortran 77:

call srotmg(d1, d2, x1, y1, param)

call drotmg(d1, d2, x1, y1, param)

Fortran 95:

call rotmg(x1, y1, param [, d1] [d2])

Description

Given cartesian coordinates (x1, y1) of an input vector, these routines compute the componentsof a modified Givens transformation matrix H that zeros the y-component of the resulting vector:

Input Parameters

REAL for srotmgd1DOUBLE PRECISION for drotmgProvides the updated scaling factor for the x-coordinate ofthe input vector (sqrt(d1)x1).

REAL for srotmgd2DOUBLE PRECISION for drotmg

67


Provides the updated scaling factor for the y-coordinate ofthe input vector (sqrt(d2)y1).

REAL for srotmgx1DOUBLE PRECISION for drotmgProvides the rotated x-coordinate of the input vector.

REAL for srotmgy1DOUBLE PRECISION for drotmgProvides the y-coordinate of the input vector.

Output Parameters

REAL for srotmgparamDOUBLE PRECISION for drotmgArray, DIMENSION 5.The elements of the param array are:param(1) contains a switch, flag. param(2-5) contain h11,h21, h12, and h22, respectively, the components of the arrayH.Depending on the values of flag, the components of H areset as follows:

68


In the above cases, the matrix entries of 1., -1., and 0. areassumed based on the last three values of flag and are notactually loaded into the param vector.



Specific details for the routine rotmg interface are the following:

The default value is 1.d1

The default value is 1.d2

?scalComputes a vector by a scalar product.

Syntax

Fortran 77:

call sscal(n, a, x, incx )

call dscal(n, a, x, incx )

call cscal(n, a, x, incx )

call zscal(n, a, x, incx )

call csscal(n, a, x, incx )

call zdscal(n, a, x, incx )

Fortran 95:

call scal(x, a)

69


Description

The ?scal routines perform a vector-vector operation defined as

x = a*x

where:

a is a scalar, x is an n-element vector.

Input Parameters


REAL for sscal and csscalaDOUBLE PRECISION for dscal and zdscalCOMPLEX for cscalDOUBLE COMPLEX for zscalSpecifies the scalar a.

REAL for sscalxDOUBLE PRECISION for dscalCOMPLEX for cscal and csscalDOUBLE COMPLEX for zscal and zdscalArray, DIMENSION at least (1 + (n -1)*abs(incx)).


Output Parameters

Overwritten by the updated vector x.x



Specific details for the routine scal interface are the following:


NOTE. Note that scalar parameter a is declared as a required parameter in Fortran-95interface for the routine scal to distinguish between routine flavors that operate ondifferent data types.

70


?swapSwaps a vector with another vector.

Syntax

Fortran 77:

call sswap(n, x, incx, y, incy )

call dswap(n, x, incx, y, incy )

call cswap(n, x, incx, y, incy )

call zswap(n, x, incx, y, incy )

Fortran 95:

call swap(x, y)

Description

Given the two complex vectors x and y, the ?swap routines return vectors y and x swapped,each replacing the other.

Input Parameters


REAL for sswapxDOUBLE PRECISION for dswapCOMPLEX for cswapDOUBLE COMPLEX for zswapArray, DIMENSION at least (1 + (n -1)*abs(incx)).


REAL for sswapyDOUBLE PRECISION for dswapCOMPLEX for cswapDOUBLE COMPLEX for zswapArray, DIMENSION at least (1 + (n -1)*abs(incy)).


71


Output Parameters

Contains the resultant vector x.x

Contains the resultant vector y.y



Specific details for the routine swap interface are the following:



i?amaxFinds the element of a vector that has the largestabsolute value.

Syntax

Fortran 77:

res = isamax(n, x, incx )

index = idamax(n, x, incx )

index = icamax(n, x, incx )

index = izamax(n, x, incx )

Fortran 95:

res = iamax(x)

Description

Given a vector x, the i?amax functions return the position of the vector element x(i) that hasthe largest absolute value for real flavors, or the largest sum |Rex(i)|+|Imx(i)| for complexflavors.

If n is not positive, 0 is returned.

72


If more than one vector element is found with the same largest absolute value, the index ofthe first one encountered is returned.

Input Parameters

INTEGER. Specifies the order of the vector x.n

REAL for isamaxxDOUBLE PRECISION for idamaxCOMPLEX for icamaxDOUBLE COMPLEX for izamaxArray, DIMENSION at least (1+(n-1)*abs(incx)).


Output Parameters

INTEGER. Contains the position of vector element x that hasthe largest absolute value.

index



Specific details for the routine amax interface are the following:

Holds the vector of length (n).x

i?aminFinds the element of a vector that has the smallestabsolute value.

Syntax

Fortran 77:

index = isamin(n, x, incx )

index = idamin(n, x, incx )

index = icamin(n, x, incx )

index = izamin(n, x, incx )

73


Fortran 95:

res = iamin(x)

Description

Given a vector x, the i?amin functions return the position of the vector element x(i) that hasthe smallest absolute value for real dlavors, or the smallest sum |Rex(i)|+|Imx(i)| forcomplex flavors.

If n is not positive, 0 is returned.

If more than one vector element is found with the same smallest absolute value, the index ofthe first one encountered is returned.

Input Parameters

INTEGER. On entry, n specifies the order of the vector x.n

REAL for isaminxDOUBLE PRECISION for idaminCOMPLEX for icaminDOUBLE COMPLEX for izaminArray, DIMENSION at least (1+(n-1)*abs(incx)).


Output Parameters

INTEGER. Contains the position of vector element x that hasthe smallest absolute value.

index



Specific details for the routine amin interface are the following:


74


dcabs1Computes absolute value of double complexnumber.

Syntax

Fortran 77:

res = dcabs1(z)

Fortran 95:

res = dcabs1(z)

Description

The dcabs1 is an auxiliary routine for a few BLAS Level 1 routines. This function performs anoperation defined as

res=|Re(z)|+|Im(z)|,

where z is a scalar and res is a value containing the absolute value of a double complex numberz.

Input Parameters

DOUBLE COMPLEX scalar.z

Output Parameters

DOUBLE PRECISION.resContains the absolute value of a double complex number z.

BLAS Level 2 RoutinesThis section describes BLAS Level 2 routines, which perform matrix-vector operations. Table2-2 lists the BLAS Level 2 routine groups and the data types associated with them.


DescriptionData TypesRoutine Groups

Matrix-vector product using a general band matrixs, d, c, z?gbmv

75



Matrix-vector product using a general matrixs, d, c, z?gemv

Rank-1 update of a general matrixs, d?ger

Rank-1 update of a conjugated general matrixc, z?gerc

Rank-1 update of a general matrix, unconjugatedc, z?geru

Matrix-vector product using a Hermitian band matrixc, z?hbmv

Matrix-vector product using a Hermitian matrixc, z?hemv

Rank-1 update of a Hermitian matrixc, z?her

Rank-2 update of a Hermitian matrixc, z?her2

Matrix-vector product using a Hermitian packedmatrix

c, z?hpmv

Rank-1 update of a Hermitian packed matrixc, z?hpr

Rank-2 update of a Hermitian packed matrixc, z?hpr2

Matrix-vector product using symmetric band matrixs, d?sbmv

Matrix-vector product using a symmetric packedmatrix

s, d?spmv

Rank-1 update of a symmetric packed matrixs, d?spr

Rank-2 update of a symmetric packed matrixs, d?spr2

Matrix-vector product using a symmetric matrixs, d?symv

Rank-1 update of a symmetric matrixs, d?syr

Rank-2 update of a symmetric matrixs, d?syr2

Matrix-vector product using a triangular band matrixs, d, c, z?tbmv

Linear solution of a triangular band matrixs, d, c, z?tbsv

76



Matrix-vector product using a triangular packedmatrix

s, d, c, z?tpmv

Linear solution of a triangular packed matrixs, d, c, z?tpsv

Matrix-vector product using a triangular matrixs, d, c, z?trmv

Linear solution of a triangular matrixs, d, c, z?trsv

?gbmvComputes a matrix-vector product using a generalband matrix

Syntax

Fortran 77:

call sgbmv(trans, m, n, kl, ku, alpha, a, lda, x, inxc, beta, y, incy )

call dgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy )

call cgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy )

call zgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call gbmv(a, x, y [,kl][,m] [,alpha][,beta] [,trans])

Description

The ?gbmv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y

or

y := alpha*A'*x + beta*y,

or

y := alpha *conjg(A')*x + beta*y,

where:

77


alpha and beta are scalars,

x and y are vectors,

A is an m-by-n band matrix, with kl sub-diagonals and ku super-diagonals.

Input Parameters

CHARACTER*1. Specifies the operation to be performed, asfollows:

trans

Operation to be Performedtrans value

y := alpha*A*x + beta*yN or n

y := alpha*A'*x + beta*yT or t

y := alpha *conjg(A')*x + beta*yC or c

INTEGER. Specifies the number of rows of the matrix A.mThe value of m must be at least zero.

INTEGER. Specifies the number of columns of the matrix A.nThe value of n must be at least zero.

INTEGER. Specifies the number of sub-diagonals of thematrix A.

kl

The value of kl must satisfy 0 ≤ kl.

INTEGER. Specifies the number of super-diagonals of thematrix A.

ku

The value of ku must satisfy 0 ≤ ku.

REAL for sgbmvalphaDOUBLE PRECISION for dgbmvCOMPLEX for cgbmvDOUBLE COMPLEX for zgbmvSpecifies the scalar alpha.

REAL for sgbmvaDOUBLE PRECISION for dgbmvCOMPLEX for cgbmvDOUBLE COMPLEX for zgbmvArray, DIMENSION (lda, n).Before entry, the leading (kl + ku + 1) by n part of thearray a must contain the matrix of coefficients. This matrixmust be supplied column-by-column, with the leading

78


diagonal of the matrix in row (ku + 1) of the array, thefirst super-diagonal starting at position 2 in row ku, the firstsub-diagonal starting at position 1 in row (ku + 2), andso on. Elements in the array a that do not correspond toelements in the band matrix (such as the top left ku by kutriangle) are not referenced.The following program segment transfers a band matrixfrom conventional full matrix storage to band storage:

do 20, j = 1, n

k = ku + 1 - j

do 10, i = max(1, j-ku), min(m,j+kl)

a(k+i, j) = matrix(i,j)

10 continue

20 continue

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. The value of lda must be at least(kl + ku + 1).

lda

REAL for sgbmvxDOUBLE PRECISION for dgbmvCOMPLEX for cgbmvDOUBLE COMPLEX for zgbmvArray, DIMENSION at least (1 + (n - 1)*abs(incx))when trans = 'N' or 'n', and at least (1 + (m -1)*abs(incx)) otherwise. Before entry, the incrementedarray x must contain the vector x.

INTEGER. Specifies the increment for the elements of x.incx must not be zero.

incx

REAL for sgbmvbetaDOUBLE PRECISION for dgbmvCOMPLEX for cgbmvDOUBLE COMPLEX for zgbmvSpecifies the scalar beta. When beta is equal to zero, theny need not be set on input.

REAL for sgbmvy

79


DOUBLE PRECISION for dgbmvCOMPLEX for cgbmvDOUBLE COMPLEX for zgbmvArray, DIMENSION at least (1 +(m - 1)*abs(incy)) whentrans = 'N' or 'n' and at least (1 +(n -1)*abs(incy)) otherwise. Before entry, the incrementedarray y must contain the vector y.

INTEGER. Specifies the increment for the elements of y.incyThe value of incy must not be zero.

Output Parameters

Overwritten by the updated vector y.y



Specific details for the routine gbmv interface are the following:

Holds the array a of size (kl+ku+1,n).a

Holds the vector of length (rx), where rx = n if trans = 'N',rx= m otherwise.

x

Holds the vector of length (ry), where ry = m if trans = 'N',ry= n otherwise.

y

Must be 'N', 'C', or 'T'.transThe default value is 'N'.

If omitted, assumed kl= ku.kl

Restored as ku= lda-kl-1.ku

If omitted, assumed m= n.m

The default value is 1.alpha

The default value is 1.beta

80


?gemvComputes a matrix-vector product using a generalmatrix

Syntax

Fortran 77:

call sgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy )

call dgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy )

call cgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy )

call zgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call gemv(a, x, y [,alpha][,beta] [,trans])

Description

The ?gemv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,

or


or

y := alpha*conjg(A')*x + beta*y,

where:



A is an m-by-n matrix.

Input Parameters


trans


y := alpha*A*x + beta*yN or n

81


y := alpha*A'*x + beta*yT or t

y := alpha *conjg(A')*x + beta*yC or c

INTEGER. Specifies the number of rows of the matrix A. Thevalue of m must be at least zero.

m

INTEGER. Specifies the number of columns of the matrix A.The value of n must be at least zero.

n

REAL for sgemvalphaDOUBLE PRECISION for dgemvCOMPLEX for cgemvDOUBLE COMPLEX for zgemvSpecifies the scalar alpha.

REAL for sgemvaDOUBLE PRECISION for dgemvCOMPLEX for cgemvDOUBLE COMPLEX for zgemvArray, DIMENSION (lda, n). Before entry, the leadingm-by-n part of the array a must contain the matrix ofcoefficients.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. The value of lda must be at leastmax(1, m).

lda

REAL for sgemvxDOUBLE PRECISION for dgemvCOMPLEX for cgemvDOUBLE COMPLEX for zgemvArray, DIMENSION at least (1+(n-1)*abs(incx)) whentrans = 'N' or 'n' and at least (1+(m - 1)*abs(incx))otherwise. Before entry, the incremented array x mustcontain the vector x.

INTEGER. Specifies the increment for the elements of x.incxThe value of incx must not be zero.

REAL for sgemvbetaDOUBLE PRECISION for dgemvCOMPLEX for cgemvDOUBLE COMPLEX for zgemv

82


Specifies the scalar beta. When beta is set to zero, then yneed not be set on input.

REAL for sgemvyDOUBLE PRECISION for dgemvCOMPLEX for cgemvDOUBLE COMPLEX for zgemvArray, DIMENSION at least (1 +(m - 1)*abs(incy)) whentrans = 'N' or 'n' and at least (1 +(n -1)*abs(incy)) otherwise. Before entry with non-zero beta, the incremented array y must contain the vector y.


Output Parameters




Specific details for the routine gemv interface are the following:

Holds the matrix A of size (m,n).a

Holds the vector of length (rx) where rx = n if trans = 'N',rx = m otherwise.

x

Holds the vector of length (ry) where ry = m if trans = 'N',ry = n otherwise.

y




83


?gerPerforms a rank-1 update of a general matrix.

Syntax

Fortran 77:

call sger(m, n, alpha, x, incx, y, incy, a, lda )

call dger(m, n, alpha, x, incx, y, incy, a, lda )

Fortran 95:

call ger(a, x, y [,alpha])

Description

The ?ger routines perform a matrix-vector operation defined as

A := alpha*x*y'+ A,

where:

alpha is a scalar,

x is an m-element vector,

y is an n-element vector,


Input Parameters



REAL for sgeralphaDOUBLE PRECISION for dgerSpecifies the scalar alpha.

REAL for sgerxDOUBLE PRECISION for dger

84


Array, DIMENSION at least (1 + (m - 1)*abs(incx)).Before entry, the incremented array x must contain them-element vector x.


REAL for sgeryDOUBLE PRECISION for dgerArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


REAL for sgeraDOUBLE PRECISION for dgerArray, DIMENSION (lda, n).Before entry, the leading m-by-n part of the array a mustcontain the matrix of coefficients.


lda

Output Parameters

Overwritten by the updated matrix.a



Specific details for the routine ger interface are the following:


Holds the vector of length (m).x



85


?gercPerforms a rank-1 update (conjugated) of a generalmatrix.

Syntax

Fortran 77:

call cgerc(m, n, alpha, x, incx, y, incy, a, lda )

call zgerc(m, n, alpha, x, incx, y, incy, a, lda )

Fortran 95:

call gerc(a, x, y [,alpha])

Description

The ?gerc routines perform a matrix-vector operation defined as

A := alpha*x*conjg(y') + A,

where:

alpha is a scalar,




Input Parameters



SINGLE PRECISION COMPLEX for cgercalphaDOUBLE PRECISION COMPLEX for zgercSpecifies the scalar alpha.

SINGLE PRECISION COMPLEX for cgercxDOUBLE PRECISION COMPLEX for zgerc

86




COMPLEX for cgercyDOUBLE COMPLEX for zgercArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


COMPLEX for cgercaDOUBLE COMPLEX for zgercArray, DIMENSION (lda, n).Before entry, the leading m-by-n part of the array a mustcontain the matrix of coefficients.


lda

Output Parameters




Specific details for the routine gerc interface are the following:





87


?geruPerforms a rank-1 update (unconjugated) of ageneral matrix.

Syntax

Fortran 77:

call cgeru(m, n, alpha, x, incx, y, incy, a, lda )

call zgeru(m, n, alpha, x, incx, y, incy, a, lda )

Fortran 95:

call geru(a, x, y [,alpha])

Description

The ?geru routines perform a matrix-vector operation defined as

A := alpha*x*y ' + A,

where:

alpha is a scalar,




Input Parameters



COMPLEX for cgerualphaDOUBLE COMPLEX for zgeruSpecifies the scalar alpha.

COMPLEX for cgeruxDOUBLE COMPLEX for zgeru

88




COMPLEX for cgeruyDOUBLE COMPLEX for zgeruArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


COMPLEX for cgeruaDOUBLE COMPLEX for zgeruArray, DIMENSION (lda, n).Before entry, the leading m-by-n part of the array a mustcontain the matrix of coefficients.


lda

Output Parameters




Specific details for the routine geru interface are the following:





89


?hbmvComputes a matrix-vector product using aHermitian band matrix.

Syntax

Fortran 77:

call chbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy )

call zhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call hbmv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?hbmv routines perform a matrix-vector operation defined as


where:


x and y are n-element vectors,

A is an n-by-n Hermitian band matrix, with k super-diagonals.

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the band matrix A is being supplied, asfollows:

uplo

Part of Matrix A Supplieduplo value

The upper triangular part of matrix A isbeing supplied.

U or u

The lower triangular part of matrix A isbeing supplied.

L or l

INTEGER. Specifies the order of the matrix A. The value ofn must be at least zero.

n

90



k

The value of k must satisfy 0 ≤ k.

COMPLEX for chbmvalphaDOUBLE COMPLEX for zhbmvSpecifies the scalar alpha.

COMPLEX for chbmvaDOUBLE COMPLEX for zhbmvArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading (k + 1)by n part of the array a must contain the upper triangularband part of the Hermitian matrix. The matrix must besupplied column-by-column, with the leading diagonal ofthe matrix in row (k + 1) of the array, the firstsuper-diagonal starting at position 2 in row k, and so on.The top left k by k triangle of the array a is not referenced.The following program segment transfers the uppertriangular part of a Hermitian band matrix from conventionalfull matrix storage to band storage:

do 20, j = 1, n

m = k + 1 - j

do 10, i = max(1, j - k), j

a(m + i, j) = matrix(i, j)

10 continue

20 continue

Before entry with uplo = 'L' or 'l', the leading (k + 1)by n part of the array a must contain the lower triangularband part of the Hermitian matrix, suppliedcolumn-by-column, with the leading diagonal of the matrixin row 1 of the array, the first sub-diagonal starting atposition 1 in row 2, and so on. The bottom right k by ktriangle of the array a is not referenced.

91


The following program segment transfers the lowertriangular part of a Hermitian band matrix from conventionalfull matrix storage to band storage:

do 20, j = 1, n

m = 1 - j

do 10, i = j, min( n, j + k )

a( m + i, j ) = matrix( i, j )

10 continue

20 continue

The imaginary parts of the diagonal elements need not beset and are assumed to be zero.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. The value of lda must be at least(k + 1).

lda

COMPLEX for chbmvxDOUBLE COMPLEX for zhbmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain thevector x.


COMPLEX for chbmvbetaDOUBLE COMPLEX for zhbmvSpecifies the scalar beta.

COMPLEX for chbmvyDOUBLE COMPLEX for zhbmvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain thevector y.


Output Parameters


92




Specific details for the routine hbmv interface are the following:

Holds the array a of size (k+1,n).a



Must be 'U' or 'L'. The default value is 'U'.uplo



?hemvComputes a matrix-vector product using aHermitian matrix.

Syntax

Fortran 77:

call chemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy )

call zhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call hemv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?hemv routines perform a matrix-vector operation defined as


where:



A is an n-by-n Hermitian matrix.

93


Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the array a is to be referenced, as follows:

uplo

Part of Array a To Be Referenceduplo value

The upper triangular part of array a is tobe referenced.

U or u

The lower triangular part of array a is tobe referenced.

L or l


n

COMPLEX for chemvalphaDOUBLE COMPLEX for zhemvSpecifies the scalar alpha.

COMPLEX for chemvaDOUBLE COMPLEX for zhemvArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of a is not referenced. Before entry with uplo= 'L' or 'l', the leading n-by-n lower triangular part ofthe array a must contain the lower triangular part of theHermitian matrix and the strictly upper triangular part of ais not referenced.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. The value of lda must be at leastmax(1, n).

lda

COMPLEX for chemvxDOUBLE COMPLEX for zhemvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


94


The value of incx must not be zero.

COMPLEX for chemvbetaDOUBLE COMPLEX for zhemvSpecifies the scalar beta. When beta is supplied as zerothen y need not be set on input.

COMPLEX for chemvyDOUBLE COMPLEX for zhemvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


Output Parameters




Specific details for the routine hemv interface are the following:

Holds the matrix A of size (n,n).a






95


?herPerforms a rank-1 update of a Hermitian matrix.

Syntax

Fortran 77:

call cher(uplo, n, alpha, x, incx, a, lda )

call zher(uplo, n, alpha, x, incx, a, lda )

Fortran 95:

call her(a, x [,uplo] [, alpha])

Description

The ?her routines perform a matrix-vector operation defined as

A := alpha*x*conjg(x') + A,

where:

alpha is a real scalar,

x is an n-element vector,


Input Parameters


uplo



U or u


L or l


n

REAL for cheralphaDOUBLE PRECISION for zherSpecifies the scalar alpha.

96


COMPLEX for cherxDOUBLE COMPLEX for zherArray, dimension at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


COMPLEX for cheraDOUBLE COMPLEX for zherArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of a is not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array a must contain the lowertriangular part of the Hermitian matrix and the strictly uppertriangular part of a is not referenced.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.


lda

Output Parameters

With uplo = 'U' or 'u', the upper triangular part of thearray a is overwritten by the upper triangular part of theupdated matrix.

a

With uplo = 'L' or 'l', the lower triangular part of thearray a is overwritten by the lower triangular part of theupdated matrix.The imaginary parts of the diagonal elements are set tozero.

97




Specific details for the routine her interface are the following:





?her2Performs a rank-2 update of a Hermitian matrix.

Syntax

Fortran 77:

call cher2(uplo, n, alpha, x, incx, y, incy, a, lda )

call zher2(uplo, n, alpha, x, incx, y, incy, a, lda )

Fortran 95:

call her2(a, x, y [,uplo][,alpha])

Description

The ?her2 routines perform a matrix-vector operation defined as

A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A,

where:

alpha is a scalar,



98


Input Parameters


uplo



U or u


L or l


n

COMPLEX for cher2alphaDOUBLE COMPLEX for zher2Specifies the scalar alpha.

COMPLEX for cher2xDOUBLE COMPLEX for zher2Array, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


COMPLEX for cher2yDOUBLE COMPLEX for zher2Array, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


COMPLEX for cher2aDOUBLE COMPLEX for zher2Array, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of a is not referenced.

99


Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array a must contain the lowertriangular part of the Hermitian matrix and the strictly uppertriangular part of a is not referenced.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.


lda

Output Parameters


a

With uplo = 'L' or 'l', the lower triangular part of thearray a is overwritten by the lower triangular part of theupdated matrix.The imaginary parts of the diagonal elements are set tozero.



Specific details for the routine her2 interface are the following:






100


?hpmvComputes a matrix-vector product using aHermitian packed matrix.

Syntax

Fortran 77:

call chpmv(uplo, n, alpha, ap, x, incx, beta, y, incy )

call zhpmv(uplo, n, alpha, ap, x, incx, beta, y, incy )

Fortran 95:

call hpmv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?hpmv routines perform a matrix-vector operation defined as


where:



A is an n-by-n Hermitian matrix, supplied in packed form.

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the matrix A is supplied in the packedarray ap, as follows:

uplo


The upper triangular part of matrix A issupplied in ap.

U or u

The lower triangular part of matrix A issupplied in ap.

L or l


n

COMPLEX for chpmvalpha

101


DOUBLE COMPLEX for zhpmvSpecifies the scalar alpha.

COMPLEX for chpmvapDOUBLE COMPLEX for zhpmvArray, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular part of the Hermitian matrix packedsequentially, column-by-column, so that ap(1) containsa(1, 1), ap(2) and ap(3) contain a(1, 2) and a(2, 2)respectively, and so on. Before entry with uplo = 'L' or'l', the array ap must contain the lower triangular part ofthe Hermitian matrix packed sequentially,column-by-column, so that ap(1) contains a(1, 1), ap(2)and ap(3) contain a(2, 1) and a(3, 1) respectively, andso on.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.

COMPLEX for chpmvxDOUBLE PRECISION COMPLEX for zhpmvArray, DIMENSION at least (1 +(n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


COMPLEX for chpmvbetaDOUBLE COMPLEX for zhpmvSpecifies the scalar beta.When beta is equal to zero then y need not be set on input.

COMPLEX for chpmvyDOUBLE COMPLEX for zhpmvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


102


Output Parameters




Specific details for the routine hpmv interface are the following:

Holds the array a of size (n*(n+1)/2).a






?hprPerforms a rank-1 update of a Hermitian packedmatrix.

Syntax

Fortran 77:

call chpr(uplo, n, alpha, x, incx, ap)

call zhpr(uplo, n, alpha, x, incx, ap)

Fortran 95:

call hpr(a, x [,uplo] [, alpha])

Description

The ?hpr routines perform a matrix-vector operation defined as

A := alpha*x*conjg(x') + A,

where:


103




Input Parameters


uplo



U or u


L or l


n

REAL for chpralphaDOUBLE PRECISION for zhprSpecifies the scalar alpha.

COMPLEX for chprxDOUBLE COMPLEX for zhprArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.

INTEGER. Specifies the increment for the elements of x.incx must not be zero.

incx

COMPLEX for chprapDOUBLE COMPLEX for zhprArray, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular part of the Hermitian matrix packedsequentially, column-by-column, so that ap(1) containsa(1, 1), ap(2) and ap(3) contain a(1, 2) and a(2, 2)respectively, and so on.

104


Before entry with uplo = 'L' or 'l', the array ap mustcontain the lower triangular part of the Hermitian matrixpacked sequentially, column-by-column, so that ap(1)contains a(1, 1), ap(2) and ap(3) contain a(2, 1) anda(3, 1) respectively, and so on.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.

Output Parameters

With uplo = 'U' or 'u', overwritten by the uppertriangular part of the updated matrix.

ap

With uplo = 'L' or 'l', overwritten by the lower triangularpart of the updated matrix.The imaginary parts of the diagonal elements are set tozero.



Specific details for the routine hpr interface are the following:





?hpr2Performs a rank-2 update of a Hermitian packedmatrix.

Syntax

Fortran 77:

call chpr2(uplo, n, alpha, x, incx, y, incy, ap )

call zhpr2(uplo, n, alpha, x, incx, y, incy, ap )

105


Fortran 95:

call hpr2(a, x, y [,uplo][,alpha])

Description

The ?hpr2 routines perform a matrix-vector operation defined as

A := alpha*x*conjg(y') + conjg(alpha)*y*conjg(x') + A,

where:

alpha is a scalar,



Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the matrix A is supplied in the packedarray ap, as follows

uplo



U or u


L or l


n

COMPLEX for chpr2alphaDOUBLE COMPLEX for zhpr2Specifies the scalar alpha.

COMPLEX for chpr2xDOUBLE COMPLEX for zhpr2Array, dimension at least (1 +(n - 1)*abs(incx)). Beforeentry, the incremented array x must contain the n-elementvector x.


COMPLEX for chpr2y

106


DOUBLE COMPLEX for zhpr2Array, DIMENSION at least (1 +(n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


COMPLEX for chpr2apDOUBLE COMPLEX for zhpr2Array, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular part of the Hermitian matrix packedsequentially, column-by-column, so that ap(1) containsa(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2)respectively, and so on.Before entry with uplo = 'L' or 'l', the array ap mustcontain the lower triangular part of the Hermitian matrixpacked sequentially, column-by-column, so that ap(1)contains a(1,1), ap(2) and ap(3) contain a(2,1) anda(3,1) respectively, and so on.The imaginary parts of the diagonal elements need not beset and are assumed to be zero.

Output Parameters


ap

With uplo = 'L' or 'l', overwritten by the lower triangularpart of the updated matrix.The imaginary parts of the diagonal elements need are setto zero.



Specific details for the routine hpr2 interface are the following:


107






?sbmvComputes a matrix-vector product using asymmetric band matrix.

Syntax

Fortran 77:

call ssbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy )

call dsbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call sbmv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?sbmv routines perform a matrix-vector operation defined as


where:



A is an n-by-n symmetric band matrix, with k super-diagonals.

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the band matrix A is being supplied, asfollows:

uplo


The upper triangular part of matrix A issupplied.

U or u

108


The lower triangular part of matrix A issupplied.

L or l


n


k


REAL for ssbmvalphaDOUBLE PRECISION for dsbmvSpecifies the scalar alpha.

REAL for ssbmvaDOUBLE PRECISION for dsbmvArray, DIMENSION (lda, n). Before entry with uplo ='U' or 'u', the leading (k + 1) by n part of the array amust contain the upper triangular band part of thesymmetric matrix, supplied column-by-column, with theleading diagonal of the matrix in row (k + 1) of the array,the first super-diagonal starting at position 2 in row k, andso on. The top left k by k triangle of the array a is notreferenced.The following program segment transfers the uppertriangular part of a symmetric band matrix from conventionalfull matrix storage to band storage:

do 20, j = 1, n

m = k + 1 - j

do 10, i = max( 1, j - k ), j


10 continue

20 continue

Before entry with uplo = 'L' or 'l', the leading (k + 1)by n part of the array a must contain the lower triangularband part of the symmetric matrix, suppliedcolumn-by-column, with the leading diagonal of the matrix

109


in row 1 of the array, the first sub-diagonal starting atposition 1 in row 2, and so on. The bottom right k by ktriangle of the array a is not referenced.The following program segment transfers the lowertriangular part of a symmetric band matrix from conventionalfull matrix storage to band storage:

do 20, j = 1, n

m = 1 - j

do 10, i = j, min( n, j + k )


10 continue

20 continue


lda

REAL for ssbmvxDOUBLE PRECISION for dsbmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain thevector x.


REAL for ssbmvbetaDOUBLE PRECISION for dsbmvSpecifies the scalar beta.

REAL for ssbmvyDOUBLE PRECISION for dsbmvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain thevector y.


110


Output Parameters




Specific details for the routine sbmv interface are the following:







?spmvComputes a matrix-vector product using asymmetric packed matrix.

Syntax

Fortran 77:

call sspmv(uplo, n, alpha, ap, x, incx, beta, y, incy )

call dspmv(uplo, n, alpha, ap, x, incx, beta, y, incy )

Fortran 95:

call spmv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?spmv routines perform a matrix-vector operation defined as


where:


111



A is an n-by-n symmetric matrix, supplied in packed form.

Input Parameters


uplo



U or u


L or l

INTEGER. Specifies the order of the matrix a. The value ofn must be at least zero.

n

REAL for sspmvalphaDOUBLE PRECISION for dspmvSpecifies the scalar alpha.

REAL for sspmvapDOUBLE PRECISION for dspmvArray, DIMENSION at least ((n*(n + 1))/2).Before entry with uplo = 'U' or 'u', the array ap mustcontain the upper triangular part of the symmetric matrixpacked sequentially, column-by-column, so that ap(1)contains a(1,1), ap(2) and ap(3) contain a(1,2) anda(2, 2) respectively, and so on. Before entry with uplo= 'L' or 'l', the array ap must contain the lower triangularpart of the symmetric matrix packed sequentially,column-by-column, so that ap(1) contains a(1,1), ap(2)and ap(3) contain a(2,1) and a(3,1) respectively, andso on.

REAL for sspmvxDOUBLE PRECISION for dspmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.

112



REAL for sspmvbetaDOUBLE PRECISION for dspmvSpecifies the scalar beta.When beta is supplied as zero, then y need not be set oninput.

REAL for sspmvyDOUBLE PRECISION for dspmvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


Output Parameters




Specific details for the routine spmv interface are the following:







113


?sprPerforms a rank-1 update of a symmetric packedmatrix.

Syntax

Fortran 77:

call sspr(uplo, n, alpha, x, incx, ap )

call dspr(uplo, n, alpha, x, incx, ap )

Fortran 95:

call spr(a, x [,uplo] [, alpha])

Description

The ?spr routines perform a matrix-vector operation defined as

a:= alpha*x*x'+ A,

where:




Input Parameters


uplo



U or u


L or l


n

REAL for sspralpha

114


DOUBLE PRECISION for dsprSpecifies the scalar alpha.

REAL for ssprxDOUBLE PRECISION for dsprArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


REAL for ssprapDOUBLE PRECISION for dsprArray, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular part of the symmetric matrix packedsequentially, column-by-column, so that ap(1) containsa(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2)respectively, and so on.Before entry with uplo = 'L' or 'l', the array ap mustcontain the lower triangular part of the symmetric matrixpacked sequentially, column-by-column, so that ap(1)contains a(1,1), ap(2) and ap(3) contain a(2,1) anda(3,1) respectively, and so on.

Output Parameters


ap

With uplo = 'L' or 'l', overwritten by the lower triangularpart of the updated matrix.



Specific details for the routine spr interface are the following:



115




?spr2Performs a rank-2 update of a symmetric packedmatrix.

Syntax

Fortran 77:

call sspr2(uplo, n, alpha, x, incx, y, incy, ap )

call dspr2(uplo, n, alpha, x, incx, y, incy, ap )

Fortran 95:

call spr2(a, x, y [,uplo][,alpha])

Description

The ?spr2 routines perform a matrix-vector operation defined as

A:= alpha*x*y'+ alpha*y*x' + A,

where:

alpha is a scalar,



Input Parameters


uplo



U or u


L or l

116



n

REAL for sspr2alphaDOUBLE PRECISION for dspr2Specifies the scalar alpha.

REAL for sspr2xDOUBLE PRECISION for dspr2Array, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


REAL for sspr2yDOUBLE PRECISION for dspr2Array, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.

INTEGER. Specifies the increment for the elements of y. Thevalue of incy must not be zero.

incy

REAL for sspr2apDOUBLE PRECISION for dspr2Array, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular part of the symmetric matrix packedsequentially, column-by-column, so that ap(1) containsa(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2)respectively, and so on.Before entry with uplo = 'L' or 'l', the array ap mustcontain the lower triangular part of the symmetric matrixpacked sequentially, column-by-column, so that ap(1)contains a(1,1), ap(2) and ap(3) contain a (2,1) anda(3,1) respectively, and so on.

Output Parameters


ap

117





Specific details for the routine spr2 interface are the following:






?symvComputes a matrix-vector product for a symmetricmatrix.

Syntax

Fortran 77:

call ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy )

call dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy )

Fortran 95:

call symv(a, x, y [,uplo][,alpha] [,beta])

Description

The ?symv routines perform a matrix-vector operation defined as


where:



118


A is an n-by-n symmetric matrix.

Input Parameters


uplo



U or u


L or l


n

REAL for ssymvalphaDOUBLE PRECISION for dsymvSpecifies the scalar alpha.

REAL for ssymvaDOUBLE PRECISION for dsymvArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the symmetric matrix A and the strictlylower triangular part of a is not referenced. Before entrywith uplo = 'L' or 'l', the leading n-by-n lower triangularpart of the array a must contain the lower triangular partof the symmetric matrix A and the strictly upper triangularpart of a is not referenced.


lda

REAL for ssymvxDOUBLE PRECISION for dsymvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


119


The value of incx must not be zero.

REAL for ssymvbetaDOUBLE PRECISION for dsymvSpecifies the scalar beta.When beta is supplied as zero, then y need not be set oninput.

REAL for ssymvyDOUBLE PRECISION for dsymvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


Output Parameters




Specific details for the routine symv interface are the following:







120


?syrPerforms a rank-1 update of a symmetric matrix.

Syntax

Fortran 77:

call ssyr(uplo, n, alpha, x, incx, a, lda )

call dsyr(uplo, n, alpha, x, incx, a, lda )

Fortran 95:

call syr(a, x [,uplo] [, alpha])

Description

The ?syr routines perform a matrix-vector operation defined as

A := alpha*x*x' + A ,

where:




Input Parameters


uplo



U or u


L or l


n

REAL for ssyralphaDOUBLE PRECISION for dsyr

121


Specifies the scalar alpha.

REAL for ssyrxDOUBLE PRECISION for dsyrArray, DIMENSION at least (1 + (n-1)*abs(incx)). Beforeentry, the incremented array x must contain the n-elementvector x.


REAL for ssyraDOUBLE PRECISION for dsyrArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the symmetric matrix A and the strictlylower triangular part of a is not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array a must contain the lowertriangular part of the symmetric matrix A and the strictlyupper triangular part of a is not referenced.


lda

Output Parameters


a

With uplo = 'L' or 'l', the lower triangular part of thearray a is overwritten by the lower triangular part of theupdated matrix.



Specific details for the routine syr interface are the following:

122






?syr2Performs a rank-2 update of symmetric matrix.

Syntax

Fortran 77:

call ssyr2(uplo, n, alpha, x, incx, y, incy, a, lda )

call dsyr2(uplo, n, alpha, x, incx, y, incy, a, lda )

Fortran 95:

call syr2(a, x, y [,uplo][,alpha])

Description

The ?syr2 routines perform a matrix-vector operation defined as

A := alpha*x*y'+ alpha*y*x' + A,

where:

alpha is a scalar,



Input Parameters


uplo



U or u

123



L or l


n

REAL for ssyr2alphaDOUBLE PRECISION for dsyr2Specifies the scalar alpha.

REAL for ssyr2xDOUBLE PRECISION for dsyr2Array, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


REAL for ssyr2yDOUBLE PRECISION for dsyr2Array, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


incy

REAL for ssyr2aDOUBLE PRECISION for dsyr2Array, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array a must contain the uppertriangular part of the symmetric matrix and the strictly lowertriangular part of a is not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array a must contain the lowertriangular part of the symmetric matrix and the strictly uppertriangular part of a is not referenced.


lda

124


Output Parameters


a




Specific details for the routine syr2 interface are the following:


Holds the vector x of length (n).x

Holds the vector y of length (n).y



?tbmvComputes a matrix-vector product using atriangular band matrix.

Syntax

Fortran 77:

call stbmv(uplo, trans, diag, n, k, a, lda, x, incx )

call dtbmv(uplo, trans, diag, n, k, a, lda, x, incx )

call ctbmv(uplo, trans, diag, n, k, a, lda, x, incx )

call ztbmv(uplo, trans, diag, n, k, a, lda, x, incx )

Fortran 95:

call tbmv(a, x [,uplo] [, trans] [,diag])

125


Description

The ?tbmv routines perform one of the matrix-vector operations defined as

x := A*x, or x := A'*x, or x := conjg(A')*x,

where:


A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k +1) diagonals.

Input Parameters

CHARACTER*1. Specifies whether the matrix is an upper orlower triangular matrix, as follows:

uplo

Matrix Auplo value

An upper triangular matrix.U or u

A lower triangular matrix.L or l


trans


x := A*xN or n

x := A'*xT or t

x := conjg(A')*xC or c

CHARACTER*1. Specifies whether or not A is unit triangular,as follows:

diag

Matrix adiag value

Matrix a is a unit triangular.U or u

Matrix a is not a unit triangular.N or n


n

INTEGER. On entry with uplo = 'U' or 'u', k specifies thenumber of super-diagonals of the matrix A. On entry withuplo = 'L' or 'l', k specifies the number of sub-diagonalsof the matrix a.

k


126


REAL for stbmvaDOUBLE PRECISION for dtbmvCOMPLEX for ctbmvDOUBLE COMPLEX for ztbmvArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading (k + 1)by n part of the array a must contain the upper triangularband part of the matrix of coefficients, suppliedcolumn-by-column, with the leading diagonal of the matrixin row (k + 1) of the array, the first super-diagonal startingat position 2 in row k, and so on. The top left k by k triangleof the array a is not referenced. The following programsegment transfers an upper triangular band matrix fromconventional full matrix storage to band storage:

do 20, j = 1, n

m = k + 1 - j

do 10, i = max(1, j - k), j

a(m + i, j) = matrix(i, j)

10 continue

20 continue

Before entry with uplo = 'L' or 'l', the leading (k + 1)by n part of the array a must contain the lower triangularband part of the matrix of coefficients, suppliedcolumn-by-column, with the leading diagonal of the matrixin row1 of the array, the first sub-diagonal starting atposition 1 in row 2, and so on. The bottom right k by k

127


triangle of the array a is not referenced. The followingprogram segment transfers a lower triangular band matrixfrom conventional full matrix storage to band storage:

do 20, j = 1, n

m = 1 - j

do 10, i = j, min(n, j + k)

a(m + i, j) = matrix (i, j)

10 continue

20 continue

Note that when diag = 'U' or 'u', the elements of thearray a corresponding to the diagonal elements of the matrixare not referenced, but are assumed to be unity.


lda

REAL for stbmvxDOUBLE PRECISION for dtbmvCOMPLEX for ctbmvDOUBLE COMPLEX for ztbmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


Output Parameters

Overwritten with the transformed vector x.x



Specific details for the routine tbmv interface are the following:

128






Must be 'N' or 'U'. The default value is 'N'.diag

?tbsvSolves a system of linear equations whosecoefficients are in a triangular band matrix.

Syntax

Fortran 77:

call stbsv(uplo, trans, diag, n, k, a, lda, x, incx )

call dtbsv(uplo, trans, diag, n, k, a, lda, x, incx )

call ctbsv(uplo, trans, diag, n, k, a, lda, x, incx )

call ztbsv(uplo, trans, diag, n, k, a, lda, x, incx )

Fortran 95:

call tbsv(a, x [,uplo] [, trans] [,diag])

Description

The ?tbsv routines solve one of the following systems of equations:

A*x = b, or A'*x = b, or conjg(A')*x = b,

where:

b and x are n-element vectors,

A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k + 1) diagonals.

The routine does not test for singularity or near-singularity.

Such tests must be performed before calling this routine.

129


Input Parameters

CHARACTER*1. Specifies whether the matrix A is an upperor lower triangular matrix, as follows:

uplo

Matrix Auplo value




trans


A*x = bN or n

A'*x = bT or t

conjg(A')*x = bC or c


diag

Matrix Adiag value

Matrix A is a unit triangular.U or u

Matrix A is not a unit triangular.N or n


n

INTEGER. On entry with uplo = 'U' or 'u', k specifies thenumber of super-diagonals of the matrix A. On entry withuplo = 'L' or 'l', k specifies the number of sub-diagonalsof the matrix A.

k


REAL for stbsvaDOUBLE PRECISION for dtbsvCOMPLEX for ctbsvDOUBLE COMPLEX for ztbsvArray, DIMENSION (lda, n).Before entry with uplo = 'U' or 'u', the leading (k + 1)by n part of the array a must contain the upper triangularband part of the matrix of coefficients, supplied

130


column-by-column, with the leading diagonal of the matrixin row (k + 1) of the array, the first super-diagonal startingat position 2 in row k, and so on. The top left k by k triangleof the array a is not referenced.The following program segment transfers an upper triangularband matrix from conventional full matrix storage to bandstorage:

do 20, j = 1, n

m = k + 1 - j

do 10, i = max(1, j - k), jl


10 continue

20 continue

Before entry with uplo = 'L' or 'l', the leading (k + 1)by n part of the array a must contain the lower triangularband part of the matrix of coefficients, suppliedcolumn-by-column, with the leading diagonal of the matrixin row 1 of the array, the first sub-diagonal starting atposition 1 in row 2, and so on. The bottom right k by ktriangle of the array a is not referenced.The following program segment transfers a lower triangularband matrix from conventional full matrix storage to bandstorage:

do 20, j = 1, n

m = 1 - j

do 10, i = j, min(n, j + k)


10 continue

20 continue

When diag = 'U' or 'u', the elements of the array acorresponding to the diagonal elements of the matrix arenot referenced, but are assumed to be unity.

131



lda

REAL for stbsvxDOUBLE PRECISION for dtbsvCOMPLEX for ctbsvDOUBLE COMPLEX for ztbsvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element right-hand side vector b.


Output Parameters

Overwritten with the solution vector x.x



Specific details for the routine tbsv interface are the following:






132


?tpmvComputes a matrix-vector product using atriangular packed matrix.

Syntax

Fortran 77:

call stpmv(uplo, trans, diag, n, ap, x, incx )

call dtpmv(uplo, trans, diag, n, ap, x, incx )

call ctpmv(uplo, trans, diag, n, ap, x, incx )

call ztpmv(uplo, trans, diag, n, ap, x, incx )

Fortran 95:

call tpmv(a, x [,uplo] [, trans] [,diag])

Description

The ?tpmv routines perform one of the matrix-vector operations defined as


where:


A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.

Input Parameters


uplo

133


Matrix Auplo value




trans

Operation To Be Performedtrans value

x := A*xN or n

x := A'*xT or t



diag

Matrix Adiag value

Matrix A is assumed to be unit triangular.U or u

Matrix A is not assumed to be unittriangular.

N or n


n

REAL for stpmvapDOUBLE PRECISION for dtpmvCOMPLEX for ctpmvDOUBLE COMPLEX for ztpmvArray, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular matrix packed sequentially,column-by-column, so that ap(1) contains a(1,1), ap(2)and ap(3) contain a(1,2) and a(2,2) respectively, andso on. Before entry with uplo = 'L' or 'l', the array apmust contain the lower triangular matrix packedsequentially, column-by-column, so that ap(1) containsa(1,1), ap(2) and ap(3) contain a(2,1) and a(3,1)respectively, and so on. When diag = 'U' or 'u', thediagonal elements of a are not referenced, but are assumedto be unity.

134


REAL for stpmvxDOUBLE PRECISION for dtpmvCOMPLEX for ctpmvDOUBLE COMPLEX for ztpmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


Output Parameters




Specific details for the routine tpmv interface are the following:






135


?tpsvSolves a system of linear equations whosecoefficients are in a triangular packed matrix.

Syntax

Fortran 77:

call stpsv(uplo, trans, diag, n, ap, x, incx )

call dtpsv(uplo, trans, diag, n, ap, x, incx )

call ctpsv(uplo, trans, diag, n, ap, x, incx )

call ztpsv(uplo, trans, diag, n, ap, x, incx )

Fortran 95:

call tpsv(a, x [,uplo] [, trans] [,diag])

Description

The ?tpsv routines solve one of the following systems of equations


where:


A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.

This routine does not test for singularity or near-singularity.


Input Parameters


uplo

Matrix Auplo value




trans

136



A*x = bN or n

A'*x = bT or t


CHARACTER*1. Specifies whether or not the matrix A is unittriangular, as follows:

diag

Matrix Adiag value



N or n


n

REAL for stpsvapDOUBLE PRECISION for dtpsvCOMPLEX for ctpsvDOUBLE COMPLEX for ztpsvArray, DIMENSION at least ((n*(n + 1))/2). Before entrywith uplo = 'U' or 'u', the array ap must contain theupper triangular matrix packed sequentially,column-by-column, so that ap(1) contains a(1, +1), ap(2)and ap(3) contain a(1, 2) and a(2, 2) respectively, andso on.Before entry with uplo = 'L' or 'l', the array ap mustcontain the lower triangular matrix packed sequentially,column-by-column, so that ap(1) contains a(1, +1), ap(2)and ap(3) contain a(2, +1) and a(3, +1) respectively,and so on.When diag = 'U' or 'u', the diagonal elements of a arenot referenced, but are assumed to be unity.

REAL for stpsvxDOUBLE PRECISION for dtpsvCOMPLEX for ctpsvDOUBLE COMPLEX for ztpsv

137


Array, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element right-hand side vector b.


Output Parameters




Specific details for the routine tpsv interface are the following:






?trmvComputes a matrix-vector product using atriangular matrix.

Syntax

Fortran 77:

call strmv(uplo, trans, diag, n, a, lda, x, incx )

call dtrmv(uplo, trans, diag, n, a, lda, x, incx )

call ctrmv(uplo, trans, diag, n, a, lda, x, incx )

call ztrmv(uplo, trans, diag, n, a, lda, x, incx )

138


Fortran 95:

call trmv(a, x [,uplo] [, trans] [,diag])

Description

The ?trmv routines perform one of the following matrix-vector operations defined as


where:


A is an n-by-n unit, or non-unit, upper or lower triangular matrix.

Input Parameters


uplo

Matrix Auplo value




trans


x := A*xN or n

x := A'*xT or t



diag

Matrix Adiag value



N or n


n

REAL for strmva

139


DOUBLE PRECISION for dtrmvCOMPLEX for ctrmvDOUBLE COMPLEX for ztrmvArray, DIMENSION (lda,n). Before entry with uplo = 'U'or 'u', the leading n-by-n upper triangular part of the arraya must contain the upper triangular matrix and the strictlylower triangular part of a is not referenced. Before entrywith uplo = 'L' or 'l', the leading n-by-n lower triangularpart of the array a must contain the lower triangular matrixand the strictly upper triangular part of a is not referenced.When diag = 'U' or 'u', the diagonal elements of a arenot referenced either, but are assumed to be unity.


lda

REAL for strmvxDOUBLE PRECISION for dtrmvCOMPLEX for ctrmvDOUBLE COMPLEX for ztrmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


Output Parameters




Specific details for the routine trmv interface are the following:



140





?trsvSolves a system of linear equations whosecoefficients are in a triangular matrix.

Syntax

Fortran 77:

call strsv(uplo, trans, diag, n, a, lda, x, incx )

call dtrsv(uplo, trans, diag, n, a, lda, x, incx )

call ctrsv(uplo, trans, diag, n, a, lda, x, incx )

call ztrsv(uplo, trans, diag, n, a, lda, x, incx )

Fortran 95:

call trsv(a, x [,uplo] [, trans] [,diag])

Description

The ?trsv routines solve one of the systems of equations:


where:


A is an n-by-n unit, or non-unit, upper or lower triangular matrix.

The routine does not test for singularity or near-singularity.


Input Parameters

CHARACTER*1. Specifies whether the matrix is an upper orlower triangular matrix, as follows:

uplo

141


Matrix Auplo value




trans


A*x = bN or n

A'*x = bT or t



diag

Matrix adiag value

Matrix a is a unit triangular.U or u

Matrix a is not a unit triangular.N or n


n

REAL for strsvaDOUBLE PRECISION for dtrsvCOMPLEX for ctrsvDOUBLE COMPLEX for ztrsvArray, DIMENSION (lda,n). Before entry with uplo = 'U'or 'u', the leading n-by-n upper triangular part of the arraya must contain the upper triangular matrix and the strictlylower triangular part of a is not referenced. Before entrywith uplo = 'L' or 'l', the leading n-by-n lower triangularpart of the array a must contain the lower triangular matrixand the strictly upper triangular part of a is not referenced.When diag = 'U' or 'u', the diagonal elements of a arenot referenced either, but are assumed to be unity.


lda

REAL for strsvx

142


DOUBLE PRECISION for dtrsvCOMPLEX for ctrsvDOUBLE COMPLEX for ztrsvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element right-hand side vector b.


Output Parameters




Specific details for the routine trsv interface are the following:

Holds the matrix a of size (n,n).a





BLAS Level 3 RoutinesBLAS Level 3 routines perform matrix-matrix operations. Table 2-3 lists the BLAS Level 3routine groups and the data types associated with them.


DescriptionData TypesRoutine Group

Matrix-matrix product of general matricess, d, c, z?gemm

Matrix-matrix product of Hermitian matricesc, z?hemm

143


DescriptionData TypesRoutine Group

Rank-k update of Hermitian matricesc, z?herk

Rank-2k update of Hermitian matricesc, z?her2k

Matrix-matrix product of symmetric matricess, d, c, z?symm

Rank-k update of symmetric matricess, d, c, z?syrk

Rank-2k update of symmetric matricess, d, c, z?symm

Matrix-matrix product of triangular matricess, d, c, z?trmm

Linear matrix-matrix solution for triangular matricess, d, c, z?trsm

Symmetric Multiprocessing Version of Intel® MKL

Many applications spend considerable time for executing BLAS level 3 routines. This time canbe scaled by the number of processors available on the system through using the symmetricmultiprocessing(SMP) feature built into the Intel MKL Library. The performance enhancementsbased on the parallel use of the processors are available without any programming effort onyour part.

To enhance performance, the library uses the following methods:

• The operation of BLAS level 3 matrix-matrix functions permits to restructure the code in away which increases the localization of data reference, enhances cache memory use, andreduces the dependency on the memory bus.

• Once the code has been effectively blocked as described above, one of the matrices isdistributed across the processors to be multiplied by the second matrix. Such distributionensures effective cache management which reduces the dependency on the memory busperformance and brings good scaling results.

144


?gemmComputes a scalar-matrix-matrix product and addsthe result to a scalar-matrix product.

Syntax

Fortran 77:

call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

call dgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

call cgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

call zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

Fortran 95:

call gemm(a, b, c [,transa][,transb] [,alpha][,beta])

Description

The ?gemm routines perform a matrix-matrix operation with general matrices. The operation isdefined as

C := alpha*op(A)*op(B) + beta*C,

where:

op(x) is one of op(x) = x, or op(x) = x', or op(x) = conjg(x'),


A, B and C are matrices:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,

C is an m-by-n matrix.

Input Parameters

CHARACTER*1. Specifies the form of op(A) to be used inthe matrix multiplication as follows:

transa

145


Form of op(a)transa value

op(A) = AN or n

op(A) = A'T or t

op(A) = conjg(A')C or c

CHARACTER*1. Specifies the form of op(B) to be used inthe matrix multiplication as follows:

transb

Form of op(b)transb value

op(B) = BN or n

op(B) = B'T or t

op(B) = conjg(B')C or c

INTEGER. Specifies the number of rows of the matrix op(A)and of the matrix C. The value of m must be at least zero.

m

INTEGER. Specifies the number of columns of the matrixop(B) and the number of columns of the matrix C.

n

The value of n must be at least zero.

INTEGER. Specifies the number of columns of the matrixop(A) and the number of rows of the matrix op(B).

k

The value of k must be at least zero.

REAL for sgemmalphaDOUBLE PRECISION for dgemmCOMPLEX for cgemmDOUBLE COMPLEX for zgemmSpecifies the scalar alpha.

REAL for sgemmaDOUBLE PRECISION for dgemmCOMPLEX for cgemmDOUBLE COMPLEX for zgemmArray, DIMENSION (lda, ka), where ka is k when transa='N' or 'n', and is m otherwise. Before entry with transa='N' or 'n', the leading m-by-k part of the array a mustcontain the matrix A, otherwise the leading k-by-m part ofthe array a must contain the matrix A.

146


INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When transa= 'N' or 'n', thenlda must be at least max(1, m), otherwise lda must be atleast max(1, k).

lda

REAL for sgemmbDOUBLE PRECISION for dgemmCOMPLEX for cgemmDOUBLE COMPLEX for zgemmArray, DIMENSION (ldb, kb), where kb is n when transb= 'N' or 'n', and is k otherwise. Before entry with transb= 'N' or 'n', the leading k-by-n part of the array b mustcontain the matrix B, otherwise the leading n-by-k part ofthe array b must contain the matrix B.

INTEGER. Specifies the first dimension of b as declared inthe calling (sub)program. When transb = 'N' or 'n', thenldb must be at least max(1, k), otherwise ldb must be atleast max(1, n).

ldb

REAL for sgemmbetaDOUBLE PRECISION for dgemmCOMPLEX for cgemmDOUBLE COMPLEX for zgemmSpecifies the scalar beta.When beta is equal to zero, then c need not be set on input.

REAL for sgemmcDOUBLE PRECISION for dgemmCOMPLEX for cgemmDOUBLE COMPLEX for zgemmArray, DIMENSION (ldc, n).Before entry, the leading m-by-n part of the array c mustcontain the matrix C, except when beta is equal to zero, inwhich case c need not be set on entry.

INTEGER. Specifies the first dimension of c as declared inthe calling (sub)program. The value of ldc must be at leastmax(1, m).

ldc

147


Output Parameters

Overwritten by the m-by-n matrix (alpha*op(A)*op(B) +beta*C).

c



Specific details for the routine gemm interface are the following:

Holds the matrix A of size (ma,ka) whereaka = k if transa= 'N',ka = m otherwise,ma = m if transa= 'N',ma = k otherwise.

Holds the matrix B of size (mb,kb) wherebkb = n if transb = 'N',kb = k otherwise,mb = k if transb = 'N',mb = n otherwise.

Holds the matrix C of size (m,n).c

Must be 'N', 'C', or 'T'.transaThe default value is 'N'.

Must be 'N', 'C', or 'T'.transbThe default value is 'N'.



148


?hemmComputes a scalar-matrix-matrix product (eitherone of the matrices is Hermitian) and adds theresult to scalar-matrix product.

Syntax

Fortran 77:

call chemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

call zhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

Fortran 95:

call hemm(a, b, c [,side][,uplo] [,alpha][,beta])

Description

The ?hemm routines perform a matrix-matrix operation using Hermitian matrices. The operationis defined as

C := alpha*A*B + beta*C

or

C := alpha*B*A + beta*C,

where:


A is an Hermitian matrix,

B and C are m-by-n matrices.

Input Parameters

CHARACTER*1. Specifies whether the Hermitian matrix Aappears on the left or right in the operation as follows:

side

Operation To Be Performedsidevalue

C := alpha*A*B + beta*CL or l

C := alpha*B*A + beta*CR or r

149


CHARACTER*1. Specifies whether the upper or lowertriangular part of the Hermitian matrix A is to be referencedas follows:

uplo

Part of Matrix A To Be Referenceduplo value

Only the upper triangular part of theHermitian matrix is to be referenced.

U or u

Only the lower triangular part of theHermitian matrix is to be referenced.

L or l

INTEGER. Specifies the number of rows of the matrix C.mThe value of m must be at least zero.

INTEGER. Specifies the number of columns of the matrix C.nThe value of n must be at least zero.

COMPLEX for chemmalphaDOUBLE COMPLEX for zhemmSpecifies the scalar alpha.

COMPLEX for chemmaDOUBLE COMPLEX for zhemmArray, DIMENSION (lda,ka), where ka is m when side ='L' or 'l' and is n otherwise. Before entry with side ='L' or 'l', the m-by-m part of the array a must contain theHermitian matrix, such that when uplo = 'U' or 'u', theleading m-by-m upper triangular part of the array a mustcontain the upper triangular part of the Hermitian matrixand the strictly lower triangular part of a is not referenced,and when uplo = 'L' or 'l', the leading m-by-m lowertriangular part of the array a must contain the lowertriangular part of the Hermitian matrix, and the strictly uppertriangular part of a is not referenced.Before entry with side = 'R' or 'r', the n-by-n part ofthe array a must contain the Hermitian matrix, such thatwhen uplo = 'U' or 'u', the leading n-by-n uppertriangular part of the array a must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of a is not referenced, and when uplo ='L' or 'l', the leading n-by-n lower triangular part of thearray a must contain the lower triangular part of the

150


Hermitian matrix, and the strictly upper triangular part ofa is not referenced. The imaginary parts of the diagonalelements need not be set, they are assumed to be zero.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub) program. When side = 'L' or 'l' thenlda must be at least max(1, m), otherwise lda must be atleast max(1,n).

lda

COMPLEX for chemmbDOUBLE COMPLEX for zhemmArray, DIMENSION (ldb,n).Before entry, the leading m-by-n part of the array b mustcontain the matrix B.

INTEGER. Specifies the first dimension of b as declared inthe calling (sub)program. The value of ldb must be at leastmax(1, m).

ldb

COMPLEX for chemmbetaDOUBLE COMPLEX for zhemmSpecifies the scalar beta.When beta is supplied as zero, then c need not be set oninput.

COMPLEX for chemmcDOUBLE COMPLEX for zhemmArray, DIMENSION (c, n). Before entry, the leading m-by-npart of the array c must contain the matrix C, except whenbeta is zero, in which case c need not be set on entry.


ldc

Output Parameters

Overwritten by the m-by-n updated matrix.c



151


Specific details for the routine hemm interface are the following:

Holds the matrix A of size (k,k) whereak = m if side = 'L',k = n otherwise.

Holds the matrix B of size (m,n).b


Must be 'L' or 'R'. The default value is 'L'.side




?herkPerforms a rank-n update of a Hermitian matrix.

Syntax

Fortran 77:

call cherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

call zherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

Fortran 95:

call herk(a, c [,uplo] [, trans] [,alpha][,beta])

Description

The ?herk routines perform a matrix-matrix operation using Hermitian matrices. The operationis defined as

C := alpha*A*conjg(A') + beta*C,

or

C := alpha*conjg(A')*A + beta*C,

where:

alpha and beta are real scalars,

C is an n-by-n Hermitian matrix,

152


A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the array c is to be referenced as follows:

uplo

Part of Array c To Be Referenceduplo value

Only the upper triangular part of c is tobe referenced.

U or u

Only the lower triangular part of c is tobe referenced.

L or l

CHARACTER*1. Specifies the operation to be performed asfollows:

trans


C:= alpha*A*conjg(A')+beta*CN or n

C:= alpha*conjg(A')*A+beta*CC or c

INTEGER. Specifies the order of the matrix C. The value ofn must be at least zero.

n

INTEGER. With trans = 'N' or 'n', k specifies the numberof columns of the matrix A, and with trans = 'C' or 'c',k specifies the number of rows of the matrix A.

k


REAL for cherkalphaDOUBLE PRECISION for zherkSpecifies the scalar alpha.

COMPLEX for cherkaDOUBLE COMPLEX for zherkArray, DIMENSION (lda, ka), where ka is k when trans= 'N' or 'n', and is n otherwise. Before entry with trans= 'N' or 'n', the leading n-by-k part of the array a mustcontain the matrix a, otherwise the leading k-by-n part ofthe array a must contain the matrix A.

153


INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When trans = 'N' or 'n', thenlda must be at least max(1, n), otherwise lda must be atleast max(1, k).

lda

REAL for cherkbetaDOUBLE PRECISION for zherkSpecifies the scalar beta.

COMPLEX for cherkcDOUBLE COMPLEX for zherkArray, DIMENSION (ldc,n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array c must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of c is not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array c must contain the lowertriangular part of the Hermitian matrix and the strictly uppertriangular part of c is not referenced.The imaginary parts of the diagonal elements need not beset, they are assumed to be zero.

INTEGER. Specifies the first dimension of c as declared inthe calling (sub)program. The value of ldc must be at leastmax(1, n).

ldc

Output Parameters

With uplo = 'U' or 'u', the upper triangular part of thearray c is overwritten by the upper triangular part of theupdated matrix.

c

With uplo = 'L' or 'l', the lower triangular part of thearray c is overwritten by the lower triangular part of theupdated matrix.The imaginary parts of the diagonal elements are set tozero.

154




Specific details for the routine herk interface are the following:

Holds the matrix A of size (ma,ka) whereaka = k if transa= 'N',ka = n otherwise,ma = n if transa= 'N',ma = k otherwise.

Holds the matrix C of size (n,n).c


Must be 'N' or 'C'. The default value is 'N'.trans



?her2kPerforms a rank-2k update of a Hermitian matrix.

Syntax

Fortran 77:

call cher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

call zher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

Fortran 95:

call her2k(a, b, c [,uplo][,trans] [,alpha][,beta])

Description

The ?her2k routines perform a rank-2k matrix-matrix operation using Hermitian matrices. Theoperation is defined as

C := alpha*A*conjg(B') + conjg(alpha)*B*conjg(A') + beta*C,

155


or

C := alpha *conjg(B')*A + conjg(alpha) *conjg(A')*B + beta*C,

where:

alpha is a scalar and beta is a real scalar,

C is an n-by-n Hermitian matrix,

A and B are n-by-k matrices in the first case and k-by-n matrices in the second case.

Input Parameters


uplo



U or u


L or l


trans


C:=alpha*A*conjg(B') +alpha*B*conjg(A') + beta*C

N or n

C:=alpha*conjg(A')*B +alpha*conjg(B')*A + beta*C

C or c


n

INTEGER. With trans = 'N' or 'n', k specifies the numberof columns of the matrix A, and with trans = 'C' or 'c',k specifies the number of rows of the matrix A.

k

The value of k must be at least equal to zero.

COMPLEX for cher2kalphaDOUBLE COMPLEX for zher2kSpecifies the scalar alpha.

COMPLEX for cher2kaDOUBLE COMPLEX for zher2k

156


Array, DIMENSION (lda, ka), where ka is k when trans= 'N' or 'n', and is n otherwise. Before entry with trans= 'N' or 'n', the leading n-by-k part of the array a mustcontain the matrix A, otherwise the leading k-by-n part ofthe array a must contain the matrix A.


lda

REAL for cher2kbetaDOUBLE PRECISION for zher2kSpecifies the scalar beta.

COMPLEX for cher2kbDOUBLE COMPLEX for zher2kArray, DIMENSION (ldb, kb), where kb is k when trans= 'N' or 'n', and is n otherwise. Before entry with trans= 'N' or 'n', the leading n-by-k part of the array b mustcontain the matrix B, otherwise the leading k-by-n part ofthe array b must contain the matrix B.

INTEGER. Specifies the first dimension of b as declared inthe calling (sub)program. When trans = 'N' or 'n', thenldb must be at least max(1, n), otherwise ldb must be atleast max(1, k).

ldb

COMPLEX for cher2kcDOUBLE COMPLEX for zher2kArray, DIMENSION (ldc,n).Before entry with uplo = 'U' or 'u', the leading n-by-nupper triangular part of the array c must contain the uppertriangular part of the Hermitian matrix and the strictly lowertriangular part of c is not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array c must contain the lowertriangular part of the Hermitian matrix and the strictly uppertriangular part of c is not referenced.The imaginary parts of the diagonal elements need not beset, they are assumed to be zero.

157



ldc

Output Parameters


c

With uplo = 'L' or 'l', the lower triangular part of thearray c is overwritten by the lower triangular part of theupdated matrix.The imaginary parts of the diagonal elements are set tozero.



Specific details for the routine her2k interface are the following:

Holds the matrix A of size (ma,ka) whereaka = k if trans = 'N',ka = n otherwise,ma = n if trans = 'N',ma = k otherwise.

Holds the matrix B of size (mb,kb) wherebkb = k if trans = 'N',kb = n otherwise,mb = n if trans = 'N',mb = k otherwise.






158


?symmPerforms a scalar-matrix-matrix product(one matrixoperand is symmetric) and adds the result to ascalar-matrix product.

Syntax

Fortran 77:

call ssymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

call dsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

call csymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

call zsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc )

Fortran 95:

call symm(a, b, c [,side][,uplo] [,alpha][,beta])

Description

The ?symm routines perform a matrix-matrix operation using symmetric matrices. The operationis defined as

C := alpha*A*B + beta*C,

or

C := alpha*B*A + beta*C,

where:


A is a symmetric matrix,

B and C are m-by-n matrices.

Input Parameters

CHARACTER*1. Specifies whether the symmetric matrix Aappears on the left or right in the operation as follows:

side

Operation to be Performedside value

Ac := alpha*A*B + beta*CL or l

159


c := alpha*B*A + beta*CR or r

CHARACTER*1. Specifies whether the upper or lowertriangular part of the symmetric matrix A is to be referencedas follows:

uplo


Only the upper triangular part of thesymmetric matrix is to be referenced.

U or u

Only the lower triangular part of thesymmetric matrix is to be referenced.

L or l

INTEGER. Specifies the number of rows of the matrix C.mThe value of m must be at least zero.

INTEGER. Specifies the number of columns of the matrix C.nThe value of n must be at least zero.

REAL for ssymmalphaDOUBLE PRECISION for dsymmCOMPLEX for csymmDOUBLE COMPLEX for zsymmSpecifies the scalar alpha.

REAL for ssymmaDOUBLE PRECISION for dsymmCOMPLEX for csymmDOUBLE COMPLEX for zsymmArray, DIMENSION (lda, ka), where ka is m when side ='L' or 'l' and is n otherwise.Before entry with side = 'L' or 'l', the m-by-m part ofthe array a must contain the symmetric matrix, such thatwhen uplo = 'U' or 'u', the leading m-by-m uppertriangular part of the array a must contain the uppertriangular part of the symmetric matrix and the strictly lowertriangular part of a is not referenced, and when uplo ='L' or 'l', the leading m-by-m lower triangular part of thearray a must contain the lower triangular part of thesymmetric matrix and the strictly upper triangular part ofa is not referenced.

160


Before entry with side = 'R' or 'r', the n-by-n part ofthe array a must contain the symmetric matrix, such thatwhen uplo = 'U' or 'u', the leading n-by-n uppertriangular part of the array a must contain the uppertriangular part of the symmetric matrix and the strictly lowertriangular part of a is not referenced, and when uplo ='L' or 'l', the leading n-by-n lower triangular part of thearray a must contain the lower triangular part of thesymmetric matrix and the strictly upper triangular part ofa is not referenced.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When side = 'L' or 'l' thenlda must be at least max(1, m), otherwise lda must be atleast max(1, n).

lda

REAL for ssymmbDOUBLE PRECISION for dsymmCOMPLEX for csymmDOUBLE COMPLEX for zsymmArray, DIMENSION (ldb,n). Before entry, the leading m-by-npart of the array b must contain the matrix B.


ldb

REAL for ssymmbetaDOUBLE PRECISION for dsymmCOMPLEX for csymmDOUBLE COMPLEX for zsymmSpecifies the scalar beta.When beta is set to zero, then c need not be set on input.

REAL for ssymmcDOUBLE PRECISION for dsymmCOMPLEX for csymmDOUBLE COMPLEX for zsymmArray, DIMENSION (ldc,n). Before entry, the leading m-by-npart of the array c must contain the matrix C, except whenbeta is zero, in which case c need not be set on entry.

161



ldc

Output Parameters

Overwritten by the m-by-n updated matrix.c



Specific details for the routine symm interface are the following:

Holds the matrix A of size (k,k) whereak = m if side = 'L',k = n otherwise.







?syrkPerforms a rank-n update of a symmetric matrix.

Syntax

Fortran 77:

call ssyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

call dsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

call csyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

call zsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc )

162


Fortran 95:

call syrk(a, c [,uplo] [, trans] [,alpha][,beta])

Description

The ?syrk routines perform a matrix-matrix operation using symmetric matrices. The operationis defined as

C := alpha*A*A' + beta*C,

or

C := alpha*A'*A + beta*C,

where:


C is an n-by-n symmetric matrix,

A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.

Input Parameters


uplo



U or u


L or l


trans


C := alpha*A*A' + beta*CN or n

C := alpha*A'*A + beta*CT or t

C := alpha*A'*A + beta*CC or c


n

163


INTEGER. On entry with trans = 'N' or 'n', k specifiesthe number of columns of the matrix a, and on entry withtrans = 'T' or 't' or 'C' or 'c', k specifies the numberof rows of the matrix a.

k


REAL for ssyrkalphaDOUBLE PRECISION for dsyrkCOMPLEX for csyrkDOUBLE COMPLEX for zsyrkSpecifies the scalar alpha.

REAL for ssyrkaDOUBLE PRECISION for dsyrkCOMPLEX for csyrkDOUBLE COMPLEX for zsyrkArray, DIMENSION (lda,ka), where ka is k when trans ='N' or 'n', and is n otherwise. Before entry with trans ='N' or 'n', the leading n-by-k part of the array a mustcontain the matrix A, otherwise the leading k-by-n part ofthe array a must contain the matrix A.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When trans = 'N' or 'n', thenlda must be at least max(1,n), otherwise lda must be atleast max(1, k).

lda

REAL for ssyrkbetaDOUBLE PRECISION for dsyrkCOMPLEX for csyrkDOUBLE COMPLEX for zsyrkSpecifies the scalar beta.

REAL for ssyrkcDOUBLE PRECISION for dsyrkCOMPLEX for csyrkDOUBLE COMPLEX for zsyrkArray, DIMENSION (ldc,n). Before entry with uplo = 'U'or 'u', the leading n-by-n upper triangular part of the arrayc must contain the upper triangular part of the symmetricmatrix and the strictly lower triangular part of c is notreferenced.

164


Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array c must contain the lowertriangular part of the symmetric matrix and the strictly uppertriangular part of c is not referenced.


ldc

Output Parameters


c

With uplo = 'L' or 'l', the lower triangular part of thearray c is overwritten by the lower triangular part of theupdated matrix.



Specific details for the routine syrk interface are the following:

Holds the matrix A of size (ma,ka) whereaka = k if transa= 'N',ka = n otherwise,ma = n if transa= 'N',ma = k otherwise.






165


?syr2kPerforms a rank-2k update of a symmetric matrix.

Syntax

Fortran 77:

call ssyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

call dsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

call csyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

call zsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc )

Fortran 95:

call syr2k(a, b, c [,uplo][,trans] [,alpha][,beta])

Description

The ?syr2k routines perform a rank-2k matrix-matrix operation using symmetric matrices.The operation is defined as

C := alpha*A*B' + alpha*B*A' + beta*C,

or

C := alpha*a'*B + alpha*B'*A + beta*C,

where:


C is an n-by-n symmetric matrix,

A and B are n-by-k matrices in the first case, and k-by-n matrices in the second case.

Input Parameters


uplo



U or u

166



L or l


trans


C := alpha*A*B'+alpha*B*A'+beta*CN or n

C := alpha*A'*B +alpha*B'*A+beta*C

T or t

C := alpha*A'*B +alpha*B'*A+beta*C

C or c


n

INTEGER. On entry with trans = 'N' or 'n', k specifiesthe number of columns of the matrices A and B, and onentry with trans = 'T' or 't' or 'C' or 'c', k specifiesthe number of rows of the matrices A and B. The value of kmust be at least zero.

k

REAL for ssyr2kalphaDOUBLE PRECISION for dsyr2kCOMPLEX for csyr2kDOUBLE COMPLEX for zsyr2kSpecifies the scalar alpha.

REAL for ssyr2kaDOUBLE PRECISION for dsyr2kCOMPLEX for csyr2kDOUBLE COMPLEX for zsyr2kArray, DIMENSION (lda,ka), where ka is k when trans ='N' or 'n', and is n otherwise. Before entry with trans ='N' or 'n', the leading n-by-k part of the array a mustcontain the matrix A, otherwise the leading k-by-n part ofthe array a must contain the matrix A.


lda

167


REAL for ssyr2kbDOUBLE PRECISION for dsyr2kCOMPLEX for csyr2kDOUBLE COMPLEX for zsyr2kArray, DIMENSION (ldb, kb) where kb is k when trans ='N' or 'n' and is 'n' otherwise. Before entry with trans= 'N' or 'n', the leading n-by-k part of the array b mustcontain the matrix B, otherwise the leading k-by-n part ofthe array b must contain the matrix B.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When trans = 'N' or 'n', thenldb must be at least max(1, n), otherwise ldb must be atleast max(1, k).

ldb

REAL for ssyr2kbetaDOUBLE PRECISION for dsyr2kCOMPLEX for csyr2kDOUBLE COMPLEX for zsyr2kSpecifies the scalar beta.

REAL for ssyr2kcDOUBLE PRECISION for dsyr2kCOMPLEX for csyr2kDOUBLE COMPLEX for zsyr2kArray, DIMENSION (ldc,n). Before entry with uplo = 'U'or 'u', the leading n-by-n upper triangular part of the arrayc must contain the upper triangular part of the symmetricmatrix and the strictly lower triangular part of c is notreferenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array c must contain the lowertriangular part of the symmetric matrix and the strictly uppertriangular part of c is not referenced.


ldc

168


Output Parameters


c

With uplo = 'L' or 'l', the lower triangular part of thearray c is overwritten by the lower triangular part of theupdated matrix.



Specific details for the routine syr2k interface are the following:

Holds the matrix A of size (ma,ka) whereaka = k if trans = 'N',ka = n otherwise,ma = n if trans = 'N',ma = k otherwise.

Holds the matrix B of size (mb,kb) wherebkb = k if trans = 'N',kb = n otherwise,mb = n if trans = 'N',mb = k otherwise.






169


?trmmComputes a scalar-matrix-matrix product (onematrix operand is triangular).

Syntax

Fortran 77:

call strmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call dtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call ctrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call ztrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

Fortran 95:

call trmm(a, b [,side] [, uplo] [,transa][,diag] [,alpha])

Description

The ?trmm routines perform a matrix-matrix operation using triangular matrices. The operationis defined as

B := alpha*op(A)*B

or

B := alpha*B*op(A)

where:

alpha is a scalar,

B is an m-by-n matrix,

A is a unit, or non-unit, upper or lower triangular matrix

op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').

Input Parameters

CHARACTER*1. Specifies whether op(A) multiplies B fromthe left or right in the operation as follows:

side

Operation To Be Performedside value

B := alpha*op(A)*BL or l

170


B := alpha*B*op(A)R or r

CHARACTER*1. Specifies whether the matrix A is an upperor lower triangular matrix as follows:

uplo

Matrix Auplo value

Matrix A is an upper triangular matrix.U or u

Matrix A is a lower triangular matrix.L or l

CHARACTER*1. Specifies the form of op(A) to be used inthe matrix multiplication as follows:

transa

Form of op(A)transa value

op(A) = AN or n

op(A) = A'T or t


CHARACTER*1. Specifies whether or not A is unit triangularas follows:

diag

Matrix Adiag value



N or n

INTEGER. Specifies the number of rows of B. The value ofm must be at least zero.

m

INTEGER. Specifies the number of columns of B. The valueof n must be at least zero.

n

REAL for strmmalphaDOUBLE PRECISION for dtrmmCOMPLEX for ctrmmDOUBLE COMPLEX for ztrmmSpecifies the scalar alpha.When alpha is zero, then a is not referenced and b neednot be set before entry.

REAL for strmmaDOUBLE PRECISION for dtrmmCOMPLEX for ctrmm

171


DOUBLE COMPLEX for ztrmmArray, DIMENSION (lda,k), where k is m when side = 'L'or 'l' and is n when side = 'R' or 'r'. Before entry withuplo = 'U' or 'u', the leading k by k upper triangularpart of the array a must contain the upper triangular matrixand the strictly lower triangular part of a is not referenced.Before entry with uplo = 'L' or 'l', the leading k by klower triangular part of the array a must contain the lowertriangular matrix and the strictly upper triangular part of ais not referenced.When diag = 'U' or 'u', the diagonal elements of a arenot referenced either, but are assumed to be unity.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When side = 'L' or 'l', thenlda must be at least max(1, m), when side = 'R' or 'r',then lda must be at least max(1, n).

lda

REAL for strmmbDOUBLE PRECISION for dtrmmCOMPLEX for ctrmmDOUBLE COMPLEX for ztrmmArray, DIMENSION (ldb,n).Before entry, the leading m-by-n part of the array b mustcontain the matrix B.


ldb

Output Parameters

Overwritten by the transformed matrix.b



Specific details for the routine trmm interface are the following:

Holds the matrix A of size (k,k) wherea

172


k = m if side = 'L',k = n otherwise.







?trsmSolves a matrix equation (one matrix operand istriangular).

Syntax

Fortran 77:

call strsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call ctrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

call ztrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )

Fortran 95:

call trsm(a, b [,side] [, uplo] [,transa][,diag] [,alpha])

Description

The ?trsm routines solve one of the following matrix equations:

op(A)*X = alpha*B,

or

X*op(A) = alpha*B,

where:

alpha is a scalar,

173


X and B are m-by-n matrices,

A is a unit, or non-unit, upper or lower triangular matrix

op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').

The matrix B is overwritten by the solution matrix X.

Input Parameters

CHARACTER*1. Specifies whether op(A) appears on the leftor right of X for the operation to be performed as follows:

side

Operation To Be Performedside value

op(A)*X = alpha*BL or l

X*op(A) = alpha*BR or r

CHARACTER*1. Specifies whether the matrix A is an upperor lower triangular matrix as follows:

uplo

Matrix Auplo value

Matrix A is an upper triangular matrix.U or u

Matrix A is a lower triangular matrix.L or l

CHARACTER*1. Specifies the form of op(a) to be used inthe matrix multiplication as follows:

transa

Form of op(a)transa value

op(A) = AN or n

op(A) = A'T or t


CHARACTER*1. Specifies whether or not the matrix A is unittriangular as follows:

diag

Matrix Adiag value



N or n

INTEGER. Specifies the number of rows of B. The value ofm must be at least zero.

m

174


INTEGER. Specifies the number of columns of B. The valueof n must be at least zero.

n

REAL for strsmalphaDOUBLE PRECISION for dtrsmCOMPLEX for ctrsmDOUBLE COMPLEX for ztrsmSpecifies the scalar alpha.When alpha is zero, then a is not referenced and b neednot be set before entry.

REAL for strsmaDOUBLE PRECISION for dtrsmCOMPLEX for ctrsmDOUBLE COMPLEX for ztrsmArray, DIMENSION (lda, k), where k is m when side ='L' or 'l' and is n when side = 'R' or 'r'. Before entrywith uplo = 'U' or 'u', the leading k by k upper triangularpart of the array a must contain the upper triangular matrixand the strictly lower triangular part of a is not referenced.Before entry with uplo = 'L' or 'l', the leading k by klower triangular part of the array a must contain the lowertriangular matrix and the strictly upper triangular part of ais not referenced.When diag = 'U' or 'u', the diagonal elements of a arenot referenced either, but are assumed to be unity.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. When side = 'L' or 'l', thenlda must be at least max(1, m), when side = 'R' or 'r',then lda must be at least max(1, n).

lda

REAL for strsmbDOUBLE PRECISION for dtrsmCOMPLEX for ctrsmDOUBLE COMPLEX for ztrsmArray, DIMENSION (ldb,n). Before entry, the leading m-by-npart of the array b must contain the right-hand side matrixB.

175


INTEGER. Specifies the first dimension of b as declared inthe calling (sub)program. The value of ldb must be at leastmax(1, +m).

ldb

Output Parameters

Overwritten by the solution matrix X.b



Specific details for the routine trsm interface are the following:

Holds the matrix A of size (k,k) where k = m if side = 'L', k =n otherwise.

a







Sparse BLAS Level 1 Routines and FunctionsThis section describes Sparse BLAS Level 1, an extension of BLAS Level 1 included in the Intel®Math Kernel Library beginning with the Intel MKL release 2.1. Sparse BLAS Level 1 is a groupof routines and functions that perform a number of common vector operations on sparse vectorsstored in compressed form.

Sparse vectors are those in which the majority of elements are zeros. Sparse BLAS routinesand functions are specially implemented to take advantage of vector sparsity. This allows youto achieve large savings in computer time and memory. If nz is the number of non-zero vectorelements, the computer time taken by Sparse BLAS operations will be O(nz).

176


Vector Arguments

Compressed sparse vectors. Let a be a vector stored in an array, and assume that the onlynon-zero elements of a are the following:

a(k1), a (k2), a (k3) . . . a(knz),

where nz is the total number of non-zero elements in a.

In Sparse BLAS, this vector can be represented in compressed form by two FORTRAN arrays,x (values) and indx (indices). Each array has nz elements:

x(1)=a(k1), x(2)=a(k2), . . . x(nz)= a(knz),

indx(1)=k1, indx(2)=k2, . . . indx(nz)= knz.

Thus, a sparse vector is fully determined by the triple (nz, x, indx). If you pass a negative orzero value of nz to Sparse BLAS, the subroutines do not modify any arrays or variables.

Full-storage vectors. Sparse BLAS routines can also use a vector argument fully stored in asingle FORTRAN array (a full-storage vector). If y is a full-storage vector, its elements mustbe stored contiguously: the first element in y(1), the second in y(2), and so on. Thiscorresponds to an increment incy = 1 in BLAS Level 1. No increment value for full-storagevectors is passed as an argument to Sparse BLAS routines or functions.

Naming Conventions

Similar to BLAS, the names of Sparse BLAS subprograms have prefixes that determine the datatype involved: s and d for single- and double-precision real; c and z for single- anddouble-precision complex respectively.

If a Sparse BLAS routine is an extension of a “dense” one, the subprogram name is formed byappending the suffix i (standing for indexed) to the name of the corresponding “dense”subprogram. For example, the Sparse BLAS routine saxpyi corresponds to the BLAS routinesaxpy, and the Sparse BLAS function cdotci corresponds to the BLAS function cdotc.

Routines and Data Types

Routines and data types supported in the Intel MKL implementation of Sparse BLAS are listedin Table 2-4 .

177


Table 2-4 Sparse BLAS Routines and Their Data Types

DescriptionData TypesRoutine/Function

Scalar-vector product plus vector (routines)s, d, c, z?axpyi

Dot product (functions)s, d?doti

Complex dot product conjugated (functions)c, z?dotci

Complex dot product unconjugated (functions)c, z?dotui

Gathering a full-storage sparse vector intocompressed form nz, x, indx (routines)

s, d, c, z?gthr

Gathering a full-storage sparse vector intocompressed form and assigning zeros to gatheredelements in the full-storage vector (routines)

s, d, c, z?gthrz

Givens rotation (routines)s, d?roti

Scattering a vector from compressed form tofull-storage form (routines)

s, d, c, z?sctr

BLAS Level 1 Routines That Can Work With Sparse Vectors

The following BLAS Level 1 routines will give correct results when you pass to them acompressed-form array x(with the increment incx=1):

sum of absolute values of vector elements?asum

copying a vector?copy

Euclidean norm of a vector?nrm2

scaling a vector?scal

index of the element with the largest absolute value for real flavors, orthe largest sum |Rex(i)|+|Imx(i)| for complex flavors.

i?amax

index of the element with the smallest absolute value for real flavors,or the smallest sum |Rex(i)|+|Imx(i)| for complex flavors.

i?amin

The result i returned by i?amax and i?amin should be interpreted as index in thecompressed-form array, so that the largest (smallest) value is x(i); the corresponding indexin full-storage array is indx(i).

178


You can also call ?roti to compute the parameters of Givens rotation and then pass theseparameters to the Sparse BLAS routines ?roti.

?axpyiAdds a scalar multiple of compressed sparse vectorto a full-storage vector.

Syntax

Fortran 77:

call saxpyi(nz, a, x, indx, y)

call daxpyi(nz, a, x, indx, y)

call caxpyi(nz, a, x, indx, y)

call zaxpyi(nz, a, x, indx, y)

Fortran 95:

call axpyi(x, indx, y [, a])

Description

The ?axpyi routines perform a vector-vector operation defined as

y := a*x + y

where:

a is a scalar,

x is a sparse vector stored in compressed form,

y is a vector in full storage form.

The ?axpyi routines reference or modify only the elements of y whose indices are listed in thearray indx.

The values in indx must be distinct.

Input Parameters

INTEGER. The number of elements in x and indx.nz

REAL for saxpyiaDOUBLE PRECISION for daxpyi

179


COMPLEX for caxpyiDOUBLE COMPLEX for zaxpyiSpecifies the scalar a.

REAL for saxpyixDOUBLE PRECISION for daxpyiCOMPLEX for caxpyiDOUBLE COMPLEX for zaxpyiArray, DIMENSION at least nz.

INTEGER. Specifies the indices for the elements of x.indxArray, DIMENSION at least nz.

REAL for saxpyiyDOUBLE PRECISION for daxpyiCOMPLEX for caxpyiDOUBLE COMPLEX for zaxpyiArray, DIMENSION at least max(indx(i)).

Output Parameters

Contains the updated vector y.y



Specific details for the routine axpyi interface are the following:

Holds the vector of length (nz).x

Holds the vector of length (nz).indx

Holds the vector of length (nz).y

The default value is 1.a

180


?dotiComputes the dot product of a compressed sparsereal vector by a full-storage real vector.

Syntax

Fortran 77:

res = sdoti(nz, x, indx, y )

res = ddoti(nz, x, indx, y )

Fortran 95:

res = doti(x, indx, y)

Description

The ?doti functions return the dot product of x and y defined as

x(1)*y(indx(1)) + x(2)*y(indx(2)) +...+ x(nz)*y(indx(nz))

where the triple (nz, x, indx) defines a sparse real vector stored in compressed form, and y isa real vector in full storage form. The functions reference only the elements of y whose indicesare listed in the array indx. The values in indx must be distinct.

Input Parameters

INTEGER. The number of elements in x and indx .nz

REAL for sdotixDOUBLE PRECISION for ddotiArray, DIMENSION at least nz.


REAL for sdotiyDOUBLE PRECISION for ddotiArray, DIMENSION at least max(indx(i)).

Output Parameters

REAL for sdotiresDOUBLE PRECISION for ddoti

181


Contains the dot product of x and y, if nz is positive.Otherwise, res contains 0.



Specific details for the routine doti interface are the following:




?dotciComputes the conjugated dot product of acompressed sparse complex vector with afull-storage complex vector.

Syntax

Fortran 77:

res = cdotci(nz, x, indx, y )

res = zdotci(nz, x, indx, y )

Fortran 95:

res = dotci(x, indx, y)

Description

The ?dotci functions return the dot product of x and y defined as

conjg(x(1))*y(indx(1)) + ... + conjg(x(nz))*y(indx(nz))

where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, andy is a real vector in full storage form. The functions reference only the elements of y whoseindices are listed in the array indx. The values in indx must be distinct.

182


Input Parameters


COMPLEX for cdotcixDOUBLE COMPLEX for zdotciArray, DIMENSION at least nz.


COMPLEX for cdotciyDOUBLE COMPLEX for zdotciArray, DIMENSION at least max(indx(i)).

Output Parameters

COMPLEX for cdotciresDOUBLE COMPLEX for zdotciContains the conjugated dot product of x and y, if nz ispositive. Otherwise, res contains 0.



Specific details for the routine dotci interface are the following:




183


?dotuiComputes the dot product of a compressed sparsecomplex vector by a full-storage complex vector.

Syntax

Fortran 77:

res = cdotui(nz, x, indx, y )

res = zdotui(nz, x, indx, y )

Fortran 95:

res = dotui(x, indx, y)

Description

The ?dotui functions return the dot product of x and y defined as

x(1)*y(indx(1)) + x(2)*y(indx(2)) +...+ x(nz)*y(indx(nz))

where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, andy is a real vector in full storage form. The functions reference only the elements of y whoseindices are listed in the array indx. The values in indx must be distinct.

Input Parameters


COMPLEX for cdotuixDOUBLE COMPLEX for zdotuiArray, DIMENSION at least nz.


COMPLEX for cdotuiyDOUBLE COMPLEX for zdotuiArray, DIMENSION at least max(indx(i)).

Output Parameters

COMPLEX for cdotuiresDOUBLE COMPLEX for zdotui

184


Contains the dot product of x and y, if nz is positive.Otherwise, res contains 0.



Specific details for the routine dotui interface are the following:




?gthrGathers a full-storage sparse vector's elementsinto compressed form.

Syntax

Fortran 77:

call sgthr(nz, y, x, indx )

call dgthr(nz, y, x, indx )

call cgthr(nz, y, x, indx )

call zgthr(nz, y, x, indx )

Fortran 95:

res = gthr(x, indx, y)

Description

The ?gthr routines gather the specified elements of a full-storage sparse vector y intocompressed form(nz, x, indx). The routines reference only the elements of y whose indicesare listed in the array indx:

x(i) = y(indx(i)), for i=1,2,... +nz.

185


Input Parameters

INTEGER. The number of elements of y to be gathered.nz

INTEGER. Specifies indices of elements to be gathered.indxArray, DIMENSION at least nz.

REAL for sgthryDOUBLE PRECISION for dgthrCOMPLEX for cgthrDOUBLE COMPLEX for zgthrArray, DIMENSION at least max(indx(i)).

Output Parameters

REAL for sgthrxDOUBLE PRECISION for dgthrCOMPLEX for cgthrDOUBLE COMPLEX for zgthrArray, DIMENSION at least nz.Contains the vector converted to the compressed form.



Specific details for the routine gthr interface are the following:




186


?gthrzGathers a sparse vector's elements intocompressed form, replacing them by zeros.

Syntax

Fortran 77:

call sgthrz(nz, y, x, indx )

call dgthrz(nz, y, x, indx )

call cgthrz(nz, y, x, indx )

call zgthrz(nz, y, x, indx )

Fortran 95:

res = gthrz(x, indx, y)

Description

The ?gthrz routines gather the elements with indices specified by the array indx from afull-storage vector y into compressed form (nz, x, indx) and overwrite the gathered elementsof y by zeros. Other elements of y are not referenced or modified (see also ?gthr).

Input Parameters

INTEGER. The number of elements of y to be gathered.nz

INTEGER. Specifies indices of elements to be gathered.indxArray, DIMENSION at least nz.

REAL for sgthrzyDOUBLE PRECISION for dgthrzCOMPLEX for cgthrzDOUBLE COMPLEX for zgthrzArray, DIMENSION at least max(indx(i)).

Output Parameters

REAL for sgthrzxDOUBLE PRECISION for d gthrzCOMPLEX for cgthrz

187


DOUBLE COMPLEX for zgthrzArray, DIMENSION at least nz.Contains the vector converted to the compressed form.

The updated vector y.y



Specific details for the routine gthrz interface are the following:




?rotiApplies Givens rotation to sparse vectors one ofwhich is in compressed form.

Syntax

Fortran 77:

call sroti(nz, x, indx, y, c, s)

call droti(nz, x, indx, y, c, s)

Fortran 95:

call roti(x, indx, y [, c] [,s])

Description

The ?roti routines apply the Givens rotation to elements of two real vectors, x (in compressedform nz, x, indx) and y (in full storage form):

x(i) = c*x(i) + s*y(indx(i))

y(indx(i)) = c*y(indx(i))- s*x(i)

The routines reference only the elements of y whose indices are listed in the array indx. Thevalues in indx must be distinct.

188


Input Parameters

INTEGER. The number of elements in x and indx.nz

REAL for srotixDOUBLE PRECISION for drotiArray, DIMENSION at least nz.


REAL for srotiyDOUBLE PRECISION for drotiArray, DIMENSION at least max(indx(i)).

A scalar: REAL for sroticDOUBLE PRECISION for droti.

A scalar: REAL for srotisDOUBLE PRECISION for droti.

Output Parameters

The updated arrays.x and y



Specific details for the routine roti interface are the following:




The default value is 1.c

The default value is 1.s

189


?sctrConverts compressed sparse vectors into fullstorage form.

Syntax

Fortran 77:

call ssctr(nz, x, indx, y )

call dsctr(nz, x, indx, y )

call csctr(nz, x, indx, y )

call zsctr(nz, x, indx, y )

Fortran 95:

call sctr(x, indx, y)

Description

The ?sctr routines scatter the elements of the compressed sparse vector (nz, x, indx) to afull-storage vector y. The routines modify only the elements of y whose indices are listed inthe array indx:

y(indx(i) = x(i), for i=1,2,... +nz.

Input Parameters

INTEGER. The number of elements of x to be scattered.nz

INTEGER. Specifies indices of elements to be scattered.indxArray, DIMENSION at least nz.

REAL for ssctrxDOUBLE PRECISION for dsctrCOMPLEX for csctrDOUBLE COMPLEX for zsctrArray, DIMENSION at least nz.Contains the vector to be converted to full-storage form.

Output Parameters

REAL for ssctry

190


DOUBLE PRECISION for dsctrCOMPLEX for csctrDOUBLE COMPLEX for zsctrArray, DIMENSION at least max(indx(i)).Contains the vector y with updated elements.



Specific details for the routine sctr interface are the following:




Sparse BLAS Level 2 and Level 3This section describes Sparse BLAS Level 2 and Level 3 included in the Intel® Math KernelLibrary. Sparse BLAS Level 2 is a group of routines and functions that perform operations ona sparse matrix and dense vectors. Sparse BLAS Level 3 is a group of routines and functionsthat perform operations on a sparse matrix and a dense matrices.

Sparse matrix is a matrix in which the majority of elements are zeros. Intel MKL sparse BLASroutines and functions are specially implemented to take advantage of matrix sparsity. Thisallows to achieve large savings in computer time and memory. The sparse BLAS routines canbe considered as building blocks for “Iterative Sparse Solvers based on Reverse CommunicationInterface (RCI ISS)” in Chapter 8 of the manual.

Naming Conventions in Sparse BLAS Level 2 and Level 3

Each Sparse BLAS routine has a six- or eight-characters base name preceding with the prefixmkl_ (mkl_cspblas_ for routines with simplified interface and zero-based indexing). Theroutines with standard interfaces have six-characters base names, the routines with simplifiedinterfaces have eight-characters base names in accordance with the templates:mkl<character code> <data> <operation>( )

191

ormkl<character code> <data> <mtype> <operation>( )

mkl_cspblas_<character code> <data> <mtype> <operation>( )

The <character code> is a character that indicates the data type:





NOTE. Current version of the Intel MKL Sparse BLAS supports only real data with doubleprecision.

The <data> field indicates the data structure of the sparse matrix (see section Sparse MatrixData Structures):

coordinate formatcoo

compressed sparse row format and its variationscsr

compressed sparse column format and its variationscsc

diagonal formatdia

skyline storage formatsky

block sparse row format and its variationsbsr

The <operation> field indicates the type of operation.

matrix-vector product (Level 2)mv

matrix-matrix product (Level 3)mm

solving a single triangular system (Level 2)sm

solving triangular systems with multiple right-hand sides (Level 3)sm

An optional field <mtype> indicates a matrix type and used in the routines with simplifiedinterfaces:

sparse representation of a general matrixge

sparse representation of the upper or lower triangle of a symmetricmatrix

sy

sparse representation of a triangular matrixtr

192

Sparse Matrix Data Structures

In the current version of Intel MKL sparse BLAS Level 2 and Level 3 the following point entry[Duff86] sparse matrix data structures are supported:

• compressed sparse row format(CSR) and its variation;

• compressed sparse column format(CSC);

• coordinate format;

• diagonal format;

• skyline storage format.

• block sparse row format(BSR) and its variations.

For more information on matrix storage schemes, see “Sparse Storage Formats for SparseBLAS Levels 2-3” in Appendix A.

Routines and Supported Operations

This section describes two main types of routines and supported operations. The followingnotations are used here:

A - is a sparse matrix;

B and C - are dense matrices;

D - is a diagonal scaling matrix;

x and y - are dense vectors;

alpha and beta - are scalars;

op(A) is one of the possible operations:

op(A) = A;

op(A) = A' - transpose of A;

op(A) = conj(A') - conjugated transpose of A.

Complete list of all routines is given in the Table 2-9 .

Routines with Standard Interface

Intel MKL Sparse BlAS routines support the following operations:

193


Level 2.

• computing a sparse matrix-dense vector product:

y := alpha*op(A)*x + beta*y

• solving a single triangular system:

y := alpha*inv(op(A))*x

Level 3.

• computing a sparse matrix-dense matrix product:

C := alpha*op(A)*B + beta*C

• solving a sparse triangular system with multiple right-hand sides:

C := alpha*inv(op(A))*B

These routines have native interface that differs from the interface used in the NIST SparseBLAS library [Rem05]. Detailed consideration of these differences can be found in the sectionInterface Consideration.

Routines with Simplified Interface

Some software packages and libraries (PARDISO package used in the Intel MKL, Sparskit 2[Saad94], Compaq Extended Math Library (CXML)[CXML01]) use different (early) variation ofthe CSR format and support only level 2 operations with simplified interfaces. Intel MKL providesa set of level 2 routines with similar simplified interfaces. Each of these routines operates ona matrix of the fixed type. The following operations are supported:

y :=op(A)*x (general and symmetric matrices)

y := inv(op(A))*x (triangular matrices)

Matrix type is indicated by the field <mtype> in the routine name (see section NamingConventions in Sparse BLAS Level 2 and Level 3).

The detail consideration of interfaces for these routines is given in the Interface Considerationsection.

These routines can operate only with four sparse data storage formats, specifically:

CSR format in variation accepted in PARDISO and CXML;

DIA format accepted in CXML;

COO format.

194

BSR format.

Note that routines in both groups described above use the same computational kernel routinesthat work with certain internal data structures.

Interface Consideration

Differences Between Intel MKL and NIST Interfaces

The Intel MKL Sparse BLAS Level 3 routines have the following standard interfaces:

mkl_xyyymm(transa, m, n, k, alpha, matdescra, arg(A), b, ldb, beta, c, ldc),for matrix-matrix product;

mkl_xyyysm(transa, m, n, alpha, matdescra, arg(A), b, ldb, c, ldc), for triangularsolvers with multiple right-hand sides.

Here x denotes data type, and yyy - sparse matrix data structure (storage format).

The analogous NIST Sparse BLAS (NSB) library routines have the following interfaces:

xyyymm(transa, m, n, k, alpha, descra, arg(A), b, ldb, beta, c, ldc, work,lwork), for matrix-matrix product;

xyyysm(transa, m, n, unitd, dv, alpha, descra, arg(A), b, ldb, beta, c, ldc,work, lwork), for triangular solvers with multiple right-hand sides.

Some similar arguments are used in both libraries. The argument transa indicates how tooperate with the matrix and is slightly different in the NSB library (see Table 2-5 ). Thearguments m and k are the number of rows and column in the matrix A, respectively, n is thenumber of columns in the matrix C. The arguments alpha and beta are scalar alpha and betarespectively. (beta is not used in the Intel MKL triangular solvers.) The arguments b and c arerectangular arrays with the first dimension ldb and ldc, respectively. The symbol arg(A)denotes the list of arguments that describe the sparse representation of A.

Table 2-5 Parameter transa

OperationNSB interfaceMKL interface

INTEGERCHARACTER*1data type

op(A) = A0N or nvalue

195


OperationNSB interfaceMKL interface

op(A) = A'1T or t

op(A) = A'2C or c

The argument matdescra describes the relevant characteristics of the matrix A.

It corresponds to the argument descra from NSB library (see Table 2-6 for more details).

Table 2-6 Possible Values of the Parameter matdescra (descra)

Matrix characteristicsNSBinterface

MKL interface

zero-basedindexing

one-basedindexing

INTEGERCharCHARACTERdata type

matrix structuredescra(1)Matdescra[0]matdescra(1)1st element

general0GGvalue

symmetric (A = A')1SS

Hermitian (A=conjg(A'))2HH

triangular3TT

skew(anti)-symmetric (A=-A')4AA

diagonal5DD

upper/lower triangular indicatordescra(2)Matdescra[1]matdescra(2)2nd element

lower1LLvalue

upper2UU

main diagonal typedescra(3)Matdescra[2]matdescra(3)3rd element

196


Matrix characteristicsNSBinterface

MKL interface

non-unit0NNvalue

unit1UU

type of indexingMatdescra[3]matdescra(4)4th element

one-based indexingFvalue

zero-based indexingC

Note that matdescra has some specifics in the Intel MKL routines. In particular, for routinesthat perform matrix-matrix and matrix-vector multiplication, they are as follows:

for general matrices (matdescra(1)='G', values of matdescra(2) and matdescra(3) areignored;

for skew-symmetrical matrices (matdescra(1)='A', values of matdescra(3) are ignored;

for diagonal matrices (matdescra(1)='D', values of matdescra(2) are ignored;

If matdescra(1) is not set to 'G' or 'T', and matdescra(2) and matdescra(3) are notdefined, then the following default values are assigned: matdescra(2)='L' andmatdescra(3)='N';

matdescra(1)='G' is not supported for the routines operating with the skyline storageformat.

For triangular solvers if matdescra(1)='D', then matdescra(2) is ignored.

For triangular solvers Intel MKL supports only matdescra(1)=T,D;

For both multiplication routines and triangular solvers when matdescra(3)='U', and the sparsematrix is not in the skyline format, then non-zero diagonal elements can be stored in the sparserepresentation even if they are non-unit; when the sparse matrix is in the skyline format, thediagonal elements must be stored in the sparse representation even if they are zero.

The current version of NSB library supports only descra(1) for matrix-matrix multiplication;descra(2), descra(3) are supported for triangular solvers only if descra(1)=3.

The argument work is a work array, and lwork is its dimension.

These arguments are not used in the Intel MKL.

197


The arguments unitd and dv are used only in NSB triangular solvers. First of them indicateswhether or not the diagonal matrix D is unitary. If unitd=1, D is the identity matrix. The lineararray dv contains the diagonal scaling matrix D if the argument unitd = 2 (the rows of A arescaled) or unitd = 3 (the columns of A are scaled).

Simplified Interfaces

The Intel MKL Sparse BLAS Level 2 routines with simplified interfaces have the followinginterfaces (x denotes data type, and yyy - sparse matrix storage format):

one-based indexing

mkl_xyyygemv(transa, m, arg(A), x, y), matrix-vector product for general sparsematrices;

mkl_xyyysymv(uplo, transa, m, arg(A), x, y), matrix-vector product for symmetricalsparse matrix;

mkl_xyyytrsv(uplo, transa, diag, m, arg(A), x, y) solution of the systems ofequations with a sparse triangular matrix.

zero-based indexing

mkl_cspblas_xyyygemv(transa, m, arg(A), x, y), matrix-vector product for generalsparse matrices;

mkl_cspblas_xyyysymv(uplo, transa, m, arg(A), x, y), matrix-vector product forsymmetrical sparse matrix;

mkl_cspblas_xyyytrsv(uplo, transa, diag, m, arg(A), x, y) solution of thesystems of equations with a sparse triangular matrix.

The argument transa indicates how to operate with the matrix (see Table 2-5 ). The argumentuplo specifies whether an upper or low triangle of the sparse matrix will be considered. Theargument diag specifies whether A is a unit triangular or not. The arguments m is the numberof rows in the matrix A. The arg(A) denotes the list of arguments that describe the sparserepresentation of A.

The array x contains the input vector, and the array y contains the result of the performedoperation.

Note that all routines for matrix-vector multiplication are able to extract triangles and/or amain diagonal from a sparse representation of the matrix A.

198


Operations with Partial Matrices

One of the distinctive feature of the Intel MKL Sparse BLAS routines is a possibility to performoperations only on certain parts (triangles and main diagonal) of the input sparse matrixspecifying the parameter matdescra. Assume that the sparse matrix A can be decomposed as

A = L + D + U

where L is the strict lower triangle of A, U is the strict upper triangle of A, D is the main diagonal.

Table 2-7 shows correspondence between the output matrix for matrix-matrix multiplicationroutines and values of matdescra for real sparse matrix A. Analogous correspondence existsfor matrix-vector multiplication routines.

Table 2-7 Correspondence Between Output Matrix and Values of matdescra (Routinesfor Matrix-Matrix Multiplication)

Output Matrixmatdescra(3)matdescra(2)matdescra(1)

alpha*op(A)*B + beta*CignoredignoredG

alpha*op(L+D+L')*B + beta*CNLS or H

alpha*op(L+I+L')*B + beta*CULS or H

alpha*op(U'+D+U)*B + beta*CNUS or H

alpha*op(U'+I+U)*B + beta*CUUS or H

alpha*op(L+I)*B + beta*CULT

alpha*op(L+D)*B + beta*CNLT

alpha*op(U+I)*B + beta*CUUT

alpha*op(U+D)*B + beta*CNUT

alpha*op(L-L')*B + beta*CignoredLA

alpha*op(U-U')*B + beta*CignoredUA

alpha*D*B + beta*CNignoredD

alpha*B + beta*CUignoredD

199


Table 2-8 shows correspondence between the output matrix for triangular solvers and valuesof matdescra for real sparse matrix A.

Table 2-8 Correspondence Between Output Matrix and Values of matdescra (TriangularSolvers)

Output Matrixmatdescra(3)matdescra(2)matdescra(1)

alpha*inv(op(L+L))*BNLT

alpha*inv(op(L+L))*BULT

alpha*inv(op(U+U))*BNUT

alpha*inv(op(U+U))*BUUT

alpha*inv(D)*BNignoredD

alpha*BUignoredD

Restrictions for Triangular Solver Routines

There are important restrictions for all Intel MKL triangular solvers, specifically:

Column indices for the compressed sparse row format must be sorted in increasing orderfor each row;

Row indices for the compressed sparse column format must be sorted in increasing orderfor each column;

For the diagonal format, elements of the array containing the diagonal numbers of thenon-zero diagonals of a sparse matrix must be sorted in increasing order.

Sparse BLAS Level 2 and Level 3 Routines.

Table 2-9 lists the sparse BLAS Level 2 and Level 3 routines described in more detail later inthis section.

200


Table 2-9 Sparse BLAS Level 2 and Level 3 Routines

DescriptionRoutine/Function

Level 2

Computes matrix - vector product of a sparse matrix stored in theCSR format.

mkl_dcsrmv

Computes matrix - vector product of a sparse general matrix storedin the CSR format (PARDISO variation)

mkl_dcsrgemv

Computes matrix - vector product of a sparse symmetrical matrixstored in the CSR format (PARDISO variation)

mkl_dcsrsymv

Computes matrix - vector product of a sparse symmetrical matrixstored in the CSR format (PARDISO variation) with zero-basedindexing

mkl_cspblas_dcsrsymv

Computes matrix - vector product for a sparse matrix in CSC format.mkl_dcscmv

Computes matrix - vector product for a sparse matrix in thecoordinate format.

mkl_dcoomv

Computes matrix - vector product of a sparse general matrix storedin the coordinate format.

mkl_dcoogemv

Computes matrix - vector product of a sparse symmetrical matrixstored in the coordinate format.

mkl_dcoosymv

Computes matrix - vector product of a sparse matrix stored in thediagonal format.

mkl_ddiamv

Computes matrix - vector product of a sparse general matrix storedin the diagonal format.

mkl_ddiagemv

Computes matrix - vector product of a sparse symmetrical matrixstored in the diagonal format.

mkl_ddiasymv

Computes matrix - vector product for a sparse matrix in the skylinestorage format.

mkl_dskymv

Computes matrix - vector product of a sparse matrix stored in theBSR format.

mkl_dbsrmv

201



Computes matrix - vector product of a sparse symmetrical matrixstored in the BSR format (PARDISO variation).

mkl_dbsrsymv

Computes matrix - vector product of a sparse symmetrical matrixstored in the BSR format (PARDISO variation) with zero-basedindexing.

mkl_cspblas_dbsrsymv

Solves a system of linear equations for a sparse matrix in the CSRformat.

mkl_dcsrsv

Triangular solvers with simplified interface for a sparse matrix inthe CSR format (PARDISO variation).

mkl_dcsrtrsv

Solves a system of linear equations for a sparse matrix in thecompressed sparse column format.

mkl_dcscsv

Solves a system of linear equations for a sparse matrix in thecoordinate format.

mkl_dcoosv

Triangular solvers with simplified interface for a sparse matrix inthe coordinate format.

mkl_dcootrsv

Solves a system of linear equations for a sparse matrix in thediagonal format.

mkl_ddiasv

Triangular solvers with simplified interface for a sparse matrix inthe diagonal format.

mkl_ddiatrsv

Solves a system of linear equations for a sparse matrix in the skylineformat.

mkl_dskysv

Level 3

Computes matrix - matrix product of a sparse matrix stored in thecompressed sparse row format

mkl_dcsrmm

Computes matrix - matrix product of a sparse matrix stored in thecompressed sparse column format

mkl_dcscmm

Computes matrix - matrix product of a sparse matrix stored in thecoordinate format.

mkl_dcoomm

202



Computes matrix - matrix product of a sparse matrix stored in thediagonal format.

mkl_ddiamm

Computes matrix - matrix product of a sparse matrix stored in theskyline storage format.

mkl_dskymm

Solves a system of linear matrix equations for a sparse matrix inthe CSR format.

mkl_dcsrsm

Solves a system of linear matrix equations for a sparse matrix inthe CSC format.

mkl_dcscsm

Solves a system of linear matrix equations for a sparse matrix inthe coordinate format.

mkl_dcoosm

Solves a system of linear matrix equations for a sparse matrix inthe diagonal format.

mkl_ddiasm

Solves a system of linear matrix equations for a sparse matrixstored in the skyline storage format.

mkl_dskysm

mkl_dcsrmvComputes matrix - vector product of a sparsematrix stored in the CSR format.

Syntax

Fortran:

call mkl_dcsrmv(transa, m, k, alpha, matdescra, val, indx, pntrb, pntre, x,beta, y)

C:

mkl_dcsrmv(&transa, &m, &k, &alpha, matdescra, val, indx, pntrb, pntre, x,&beta, y);

203


Description

The mkl_dcsrmv routine performs a matrix-vector operation defined as


or


where:



A is an m-by-k sparse matrix in the CSR format, A' is the transpose of A.

Input Parameters

Parameter descriptions are common for all implemented interfaces with the exception of datatypes that refer here to the Fortran 77 standard types. Data types specific to the differentinterfaces are described in the section “Interfaces” below.

CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-vector product iscomputed as y := alpha*A*x + beta*yIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as y := alpha*A'*x + beta*y,

INTEGER. Number of rows of the matrix A.m

INTEGER. Number of columns of the matrix A.k

REAL*8. Specifies the scalar alpha.alpha

CHARACTER. Array of six elements, specifies properties ofthe matrix used for operation. Only first four array elementsare used, their possible values are given in the Table 2-6 .

matdescra

REAL*8. Array containing non-zero elements of the matrixA. Its length is pntre(m - pntrb(1).

val

Refer to values array description in CSR Format for moredetails.

INTEGER. Array containing the column indices for eachnon-zero element of the matrix A.Its length is equal to lengthof the val array.

indx

204


Refer to columns array description in CSR Format for moredetails.

INTEGER. Array of length m, contains row indices, such thatpntrb(i) - pntrb(1)+1 is the first index of row i in thearrays val and indx. Refer to pointerb array descriptionin CSR Format for more details.

pntrb

INTEGER. Array of length m, contains row indices, such thatpntre(i) - pntrb(1) is the last index of row i in thearrays val and indx. Refer to pointerE array descriptionin CSR Format for more details.

pntre

REAL*8.xArray, DIMENSION at least k if transa = 'N' or 'n' andat least m otherwise. Before entry, the array x must containthe vector x.

REAL*8. Specifies the scalar beta.beta

REAL*8.yArray, DIMENSION at least m if transa = 'N' or 'n' andat least k otherwise. Before entry, the array y must containthe vector y.

Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcsrmv(transa, m, k, alpha, matdescra, val, indx,

pntrb, pntre, x, beta, y)

CHARACTER*1 transa

CHARACTER matdescra(*)

205


INTEGER m, k

INTEGER indx(*), pntrb(m), pntre(m)

REAL*8 alpha, beta

REAL*8 val(*), x(*), y(*)

C:void mkl_dcsrmv(char *transa, int *m, int *k, double *alpha, char *matdescra,

double *val, int *indx, int *pntrb, int *pntre, double *x, double *beta,double *y);

mkl_dcsrgemvComputes matrix - vector product of a sparsegeneral matrix stored in the CSR format (PARDISOvariation).

Syntax

Fortran:

call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)

C:

mkl_dcsrgemv(&transa, &m, a, ia, ja, x, y);

Description

The mkl_dcsrgemv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


A is an m-by-m sparse square matrix in the CSR format (PARDISO variation), A' is the transposeof A.

206


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-vector product iscomputed as y := A*xIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as y := A'*x,


REAL*8. Array containing non-zero elements of the matrixA. Its length is equal to the number of non-zero elementsin the matrix A. Refer to values array description in SparseMatrix Storage Formats for more details.

a

INTEGER. Array of length m + 1, containing indices ofelements in the array a, such that ia(i) is the index in thearray a of the first non-zero element from the row i. The

ia

value of the last element ia(m + 1)-1 is equal to thenumber of non-zeros plus one. Refer to rowIndex arraydescription in Sparse Matrix Storage Formats for moredetails.

REAL*8. Array containing the column indices for eachnon-zero element of the matrix A.

ja

Its length is equal to the length of the array a. Refer tocolumns array description in Sparse Matrix Storage Formatsfor more details.

REAL*8.xArray, DIMENSION is m.Before entry, the array x must contain the vector x.

Output Parameters

REAL*8.yArray, DIMENSION at least m.On exit, the array y must contain the vector y.

207


Interfaces

Fortran 77:SUBROUTINE mkl_dcsrgemv(transa, m, a, ia, ja, x, y)

CHARACTER*1 transa

INTEGER m

INTEGER ia(*), ja(*)

REAL*8 a(*), x(*), y(*)

C:void mkl_dcsrgemv(char *transa, int *m, double *a, int *ia, int *ja, double*x, double *y);

mkl_dcsrsymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the CSR format(PARDISO variation).

Syntax

Fortran:

call mkl_dcsrsymv(uplo, m, a, ia, ja, x, y)

C:

mkl_dcsrsymv(&uplo, &m, a, ia, ja, x, y);

Description

The mkl_dcsrsymv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


208


A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (PARDISOvariation), A' is the transpose of A.

Input Parameters


CHARACTER*1. Specifies whether the upper or low triangleof the matrix A is considered.

uplo

If uplo = 'U' or 'u', the upper triangle of the matrix A isused.If uplo = 'L' or 'l', the low triangle of the matrix A isused.



a


ia



ja



209


Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcsrsymv(uplo, m, a, ia, ja, x, y)

CHARACTER*1 uplo

INTEGER m


REAL*8 a(*), x(*), y(*)

C:void mkl_dcsrsymv(char *uplo, int *m, double *a, int *ia, int *ja, double*x, double *y);

mkl_cspblas_dcsrsymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the CSR format(PARDISO variation) with zero-based indexing.

Syntax

Fortran:

call mkl_cspblas_dcsrsymv(uplo, m, a, ia, ja, x, y)

C:

mkl_cspblas_dcsrsymv(&uplo, &m, a, ia, ja, x, y);

Description

The mkl_cspblas_dcsrsymv routine performs a matrix-vector operation defined as

y := A*x

210


or

y := A'*x,

where:


A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (PARDISOvariation) with zero-based indexing, A' is the transpose of A.

Input Parameters



uplo




a


ia

value of the last element ia(m + 1) is equal to the numberof non-zeros. Refer to rowIndex array description in SparseMatrix Storage Formats for more details.


ja


REAL*8.x

211


Array, DIMENSION is m.Before entry, the array x must contain the vector x.

Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_cspblas_dcsrsymv(uplo, m, a, ia, ja, x, y)

CHARACTER*1 uplo

INTEGER m


REAL*8 a(*), x(*), y(*)

C:void mkl_cspblas_dcsrsymv(char *uplo, int *m, double *a, int *ia, int *ja,double *x, double *y);

mkl_dcscmvComputes matrix - vector product for a sparsematrix in the compressed sparse column format.

Syntax

Fortran:

call mkl_dcscmv(transa, m, k, alpha, matdescra, val, indx, pntrb, pntre, x,beta, y)

C:

mkl_dcscmv(&transa, &m, &k, &alpha, matdescra, val, indx, pntrb, pntre, x,&beta, y);

212


Description

The mkl_dcscmv routine performs a matrix-vector operation defined as


or


where:



A is an m-by-k sparse matrix in compressed sparse column format, A' is the transpose of A.

Input Parameters







matdescra

REAL*8. Array containing non-zero elements of the matrixA. Its length is pntre(k) - pntrb(1).

val

Refer to values array description in CSC Format for moredetails.

INTEGER. Array containing the row indices for each non-zeroelement of the matrix A.Its length is equal to length of theval array.

indx

213


Refer to rows array description in CSC Format for moredetails.

INTEGER. Array of length k, contains row indices, such thatpntrb(i) - pntrb(1)+1 is the starting index of columni in the arrays val and indx. Refer to pointerb arraydescription in CSC Format for more details.

pntrb

INTEGER. Array of length k, contains row indices, such thatpntre(i) - pntrb(1) is the last index of column i in thearrays val and indx. Refer to pointerE array descriptionin CSC Format for more details.

pntre




Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcscmv(transa, m, k, alpha, matdescra, val, indx, pntrb,pntre, x, beta, y)

CHARACTER*1 transa


214


INTEGER m, k, ldb, ldc


REAL*8 alpha, beta

REAL*8 val(*), x(*), y(*)

C:void mkl_dcscmv(char *transa, int *m, int *k, double *alpha, char *matdescra,double *val, int *indx, int *pntrb, int *pntre, double *x, double *beta,double *y);

mkl_dcoomvComputes matrix - vector product for a sparsematrix in the coordinate format.

Syntax

Fortran:

call mkl_dcoomv(transa, m, k, alpha, matdescra, val, rowind, colind, nnz, x,beta, y)

C:

mkl_dcoomv(&transa, &m, &k, &alpha, matdescra, val, rowind, colind, &nnz, x,&beta, y);

Description

The mkl_dcoomv routine performs a matrix-vector operation defined as


or


where:



A is an m-by-k sparse matrix in compressed coordinate format, A' is the transpose of A.

215


Input Parameters







matdescra

REAL*8. Array of length nnz, contains non-zero elementsof the matrix A in the arbitrary order.

val

Refer to values array description in Coordinate Format formore details.

INTEGER. Array of length nnz, contains the row indices foreach non-zero element of the matrix A.

rowind

Refer to rows array description in Coordinate Format formore details.

INTEGER. Array of length nnz, contains the column indicesfor each non-zero element of the matrix A. Refer to columnsarray description in Coordinate Format for more details.

colind

INTEGER. Specifies the number of non-zero element of thematrix A.

nnz

Refer to nnz description in Coordinate Format for moredetails.



216



Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcoomv(transa, m, k, alpha, matdescra, val, rowind, colind,nnz, x, beta, y)

CHARACTER*1 transa


INTEGER m, k, nnz

INTEGER rowind(*), colind(*)

REAL*8 alpha, beta

REAL*8 val(*), x(*), y(*)

C:void mkl_dcoomv(char *transa, int *m, int *k, double *alpha, char *matdescra,double *val, int *rowind, int *colind, int *nnz, double *x, double *beta,double *y);

mkl_dcoogemvComputes matrix - vector product of a sparsegeneral matrix stored in the coordinate format.

Syntax

Fortran:

call mkl_dcoogemv(transa, m, val, rowind, colind, nnz, x, y)

217


C:

mkl_dcoogemv(&transa, &m, val, rowind, colind, &nnz, x, y);

Description

The mkl_dcoogemv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


A is an m-by-m sparse square matrix in the coordinate format, A' is the transpose of A.

Input Parameters





val



rowind



colind

218



nnz



Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcoogemv(transa, m, val, rowind, colind, nnz, x, y)

CHARACTER*1 transa

INTEGER m, nnz


REAL*8 val(*), x(*), y(*)

C:void mkl_dcoogemv(char *transa, int *m, double *val, int *rowind, int*colind, int *nnz, double *x, double *y);

mkl_dcoosymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the coordinate format.

Syntax

Fortran:

call mkl_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y)

219


C:

mkl_dcoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y);

Description

The mkl_dcoosymv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format, A'is the transpose of A.

Input Parameters



uplo




val



rowind


220



colind


nnz



Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y)

CHARACTER*1 uplo

INTEGER m, nnz


REAL*8 val(*), x(*), y(*)

C:void mkl_dcoosymv(char *uplo, int *m, double *val, int *rowind, int *colind,int *nnz, double *x, double *y);

221


mkl_ddiamvComputes matrix - vector product for a sparsematrix in the diagonal format.

Syntax

Fortran:

call mkl_ddiamv(transa, m, k, alpha, matdescra, val, lval, idiag, ndiag, x,beta, y)

C:

mkl_ddiamv(&transa, &m, &k, &alpha, matdescra, val, &lval, idiag, &ndiag, x,&beta, y);

Description

The mkl_ddiamv routine performs a matrix-vector operation defined as


or


where:



A is an m-by-k sparse matrix stored in the diagonal format, A' is the transpose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-vector product iscomputed as y := alpha*A*x + beta*y,If transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as y := alpha*A'*x + beta*y.


222





matdescra

REAL*8. Two-dimensional array of size lval by ndiag,contains non-zero diagonals of the matrix A. Refer to valuesarray description in Diagonal Storage Scheme for moredetails.

val

INTEGER. Leading dimension of val, lval ≥min(m, k).Refer to lval description in Diagonal Storage Scheme formore details.

lval

INTEGER. Array of length ndiag, contains the distancesbetween main diagonal and each non-zero diagonals in thematrix A.

idiag

Refer to distance array description in Diagonal StorageScheme for more details.

INTEGER. Specifies the number of non-zero diagonals of thematrix A.

ndiag

REAL*8.xArray, DIMENSION at least k if transa = 'N' or 'n', andat least m otherwise. Before entry, the array x must containthe vector x.


REAL*8.yArray, DIMENSION at least m if transa = 'N' or 'n', andat least k otherwise. Before entry, the array y must containthe vector y.

Output Parameters


223


Interfaces

Fortran 77:SUBROUTINE mkl_ddiamv(transa, m, k, alpha, matdescra, val, lval, idiag,ndiag, x, beta, y)

CHARACTER*1 transa


INTEGER m, k, lval, ndiag

INTEGER idiag(*)

REAL*8 alpha, beta

REAL*8 val(lval,*), x(*), y(*)

C:void mkl_ddiamv(char *transa, int *m, int *k, double *alpha, char *matdescra,double *val, int *lval, int *idiag, int *ndiag, double *x, double *beta,double *y);

mkl_ddiagemvComputes matrix - vector product of a sparsegeneral matrix stored in the diagonal format.

Syntax

Fortran:

call mkl_ddiagemv(transa, m, val, lval, idiag, ndiag, x, y)

C:

mkl_ddiagemv(&transa, &m, val, &lval, idiag, &ndiag, x, y);

Description

The mkl_ddiagemv routine performs a matrix-vector operation defined as

y := A*x

224


or

y := A'*x,

where:


A is an m-by-m sparse square matrix in the diagonal storage format, A' is the transpose of A.

Input Parameters





val

INTEGER. Leading dimension of val, lval≥min(m, k).Refer to lval description in Diagonal Storage Scheme formore details.

lval


idiag



ndiag


225


Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_ddiagemv(transa, m, val, lval, idiag, ndiag, x, y)

CHARACTER*1 transa

INTEGER m, lval, ndiag

INTEGER idiag(*)


C:void mkl_ddiagemv(char *transa, int *m, double *val, int *lval, int *idiag,int *ndiag, double *x, double *y);

mkl_ddiasymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the diagonal format.

Syntax

Fortran:

call mkl_ddiasymv(uplo, m, val, lval, idiag, ndiag, x, y)

C:

mkl_ddiasymv(&uplo, &m, val, &lval, idiag, &ndiag, x, y);

Description

The mkl_ddiasymv routine performs a matrix-vector operation defined as

y := A*x

226


or

y := A'*x,

where:


A is an upper or lower triangle of the symmetrical sparse matrix, A' is the transpose of A.

Input Parameters



uplo




val

INTEGER. Leading dimension of val, val, lval ≥min(m,k). Refer to lval description in Diagonal Storage Schemefor more details.

lval


idiag



ndiag


227


Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_ddiasymv(uplo, m, val, lval, idiag, ndiag, x, y)

CHARACTER*1 uplo


INTEGER idiag(*)


C:void mkl_ddiasymv(char *uplo, int *m, double *val, int *lval, int *idiag,int *ndiag, double *x, double *y);

mkl_dskymvComputes matrix - vector product for a sparsematrix in the skyline storage format.

Syntax

Fortran:

call mkl_dskymv(transa, m, k, alpha, matdescra, val, pntr, x, beta, y)

C:

mkl_dskymv(&transa, &m, &k, &alpha, matdescra, val, pntr, x, &beta, y);

Description

The mkl_dskymv routine performs a matrix-vector operation defined as


228


or


where:



A is an m-by-k sparse matrix stored using the skyline storage scheme, A' is the transpose ofA.

Input Parameters







matdescra

REAL*8. Array containing the set of elements of the matrixA in the skyline profile form.

val

If matdescrsa(2)= 'L', then val contains elements fromthe low triangle of the matrix A.If matdescrsa(2)= 'U', then val contains elements fromthe upper triangle of the matrix A.Refer to values array description in Skyline Storage Schemefor more details.

INTEGER. Array of length (m+m) for lower triangle, and(k+k) for upper triangle.

pntr

229


It contains the indices specifying in the val the positions ofthe first element in each row (column) of the matrix A. Referto pointers array description in Skyline Storage Schemefor more details.




Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dskymv(transa, m, k, alpha, matdescra, val, pntr, x, beta,y)

CHARACTER*1 transa


INTEGER m, k

INTEGER pntr(*)

REAL*8 alpha, beta

REAL*8 val(*), x(*), y(*)

C:void mkl_dskymv (char *transa, int *m, int *k, double *alpha, char *matdescra,double *val, int *pntr, double *x, double *beta, double *y);

230


mkl_dbsrmvComputes matrix - vector product of a sparsematrix stored in the BSR format.

Syntax

Fortran:

call mkl_dbsrmv(transa, m, k, lb, alpha, matdescra, val, indx, pntrb, pntre,x, beta, y)

C:

mkl_dbsrmv(&transa, &m, &k, &lb, &alpha, matdescra, val, indx, pntrb, pntre,x, &beta, y);

Description

The mkl_dbsrmv routine performs a matrix-vector operation defined as


or


where:



A is an m-by-k block sparse matrix in the BSR format, A' is the transpose of A.

Input Parameters



INTEGER. Number of block rows of the matrix A.m

231


INTEGER. Number of block columns of the matrix A.k

INTEGER. Size of the block in the matrix A.lb



matdescra

REAL*8. Array containing elements of non-zero blocks ofthe matrix A. Its length is equal to the number number ofnon-zero blocks in the matrix A multiplied by lb*lb.

val

Refer to values array description in BSR Format for moredetails.

INTEGER. Array containing the column indices for eachnon-zero block in the matrix A. Its length is equal to thenumber of non-zero blocks in the matrix A.

indx

Refer to columns array description in BSR Format for moredetails.

INTEGER. Array of length m.pntrbFor one-based indexing: this array contains row indices,such that pntrb(i) - pntrb(1)+1 is the first index ofblock row i in the array indx.For zero-based indexing: this array contains row indices,such that pntrb(i) - pntrb(0) is the first index of blockrow i in the array indxRefer to pointerB array description in BSR Format for moredetails.

INTEGER. Array of length m.pntreFor one-based intexing this array contains row indices, suchthat pntre(i) - pntrb(1) is the last index of block rowi in the array indx.For zero-based intexing this array contains row indices, suchthat pntre(i) - pntrb(0)-1 is the last index of blockrow i in the array indx.Refer to pointerE array description in BSR Format for moredetails.

REAL*8.x

232


Array, DIMENSION at least (k*lb) if transa = 'N' or 'n',and at least (m*lb) otherwise. Before entry, the array xmust contain the vector x.


REAL*8.yArray, DIMENSION at least (m*lb) if transa = 'N' or 'n',and at least (k*lb) otherwise. Before entry, the array ymust contain the vector y.

Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dbsrmv(transa, m, k, lb, alpha, matdescra, val, indx,

pntrb, pntre, x, beta, y)

CHARACTER*1 transa


INTEGER m, k, lb


REAL*8 alpha, beta

REAL*8 val(*), x(*), y(*)

C:void mkl_dbsrmv(char *transa, int *m, int *k, int *lb, double *alpha, char*matdescra,

double *val, int *indx, int *pntrb, int *pntre, double *x, double *beta,double *y);

233


mkl_dbsrsymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the BSR format(PARDISO variation).

Syntax

Fortran:

call mkl_dbsrsymv(uplo, m, lb, a, ia, ja, x, y)

C:

mkl_dbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y);

Description

The mkl_dbsrsymv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (PARDISOvariation), A' is the transpose of A.

Input Parameters



uplo


234




REAL*8. Array containing elements of non-zero blocks ofthe matrix A. Its length is equal to the number of non-zeroblocks in the matrix A multiplied by lb*lb. Refer to valuesarray description in BSR Format for more details.

a

INTEGER. Array of length (m + 1), containing indices ofblock in the array a, such that ia(i) is the index in thearray a of the first non-zero element from the row i. The

ia

value of the last element ia(m + 1) is equal to the numberof non-zero blocks plus one. Refer to rowIndex arraydescription in BSR Format for more details.

REAL*8. Array containing the column indices for eachnon-zero block in the matrix A.

ja

Its length is equal to the number of non-zero blocks of thematrix A. Refer to columns array description in BSR Formatfor more details.

REAL*8.xArray, DIMENSION (m*lb).Before entry, the array x must contain the vector x.

Output Parameters

REAL*8.yArray, DIMENSION at least (m*lb).On exit, the array y must contain the vector y.

Interfaces

Fortran 77:SUBROUTINE mkl_dbsrsymv(uplo, m, lb, a, ia, ja, x, y)

CHARACTER*1 uplo

INTEGER m, lb


REAL*8 a(*), x(*), y(*)

235


C:void mkl_dbsrsymv(char *uplo, int *m, int *lb, double *a, int *ia, int *ja,double *x, double *y);

mkl_cspblas_dbsrsymvComputes matrix - vector product of a sparsesymmetrical matrix stored in the BSR format(PARDISO variation) with zero-based indexing.

Syntax

Fortran:

call mkl_cspblas_dbsrsymv(uplo, m, lb, a, ia, ja, x, y)

C:

mkl_cspblas_dbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y);

Description

The mkl_cspblas_dbsrsymv routine performs a matrix-vector operation defined as

y := A*x

or

y := A'*x,

where:


A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (PARDISOvariation) with zero-based indexing, A' is the transpose of A.

Input Parameters



uplo

236





REAL*8. Array containing elements of non-zero blocks ofthe matrix A. Its length is equal to the number of non-zeroblocks in the matrix A multiplied by lb*lb. Refer to valuesarray description in BSR Format for more details.

a

INTEGER. Array of length (m + 1), containing indices ofblock in the array a, such that ia(i) is the index in thearray a of the first non-zero element from the row i. The

ia

value of the last element ia(m + 1) is equal to the numberof non-zero blocks plus one. Refer to rowIndex arraydescription in BSR Format for more details.

REAL*8. Array containing the column indices for eachnon-zero block in the matrix A.

ja

Its length is equal to the number of non-zero blocks of thematrix A. Refer to columns array description in BSR Formatfor more details.

REAL*8.xArray, DIMENSION (m*lb).Before entry, the array x must contain the vector x.

Output Parameters

REAL*8.yArray, DIMENSION at least (m*lb).On exit, the array y must contain the vector y.

237


Interfaces

Fortran 77:SUBROUTINE mkl_cspblas_dbsrsymv(uplo, m, lb, a, ia, ja, x, y)

CHARACTER*1 uplo

INTEGER m, lb


REAL*8 a(*), x(*), y(*)

C:void mkl_cspblas_dbsrsymv(char *uplo, int *m, int *lb, double *a, int *ia,int *ja, double *x, double *y);

mkl_dcsrsvSolves a system of linear equations for a sparsematrix in the CSR format.

Syntax

Fortran:

call mkl_dcsrsv(transa, m, alpha, matdescra, val, indx, pntrb, pntre, x, y)

C:

mkl_dcsrsv(&transa, &m, &alpha, matdescra, val, indx, pntrb, pntre, x, y);

Description

The mkl_dcsrsv routine solves a system of linear equations with matrix-vector operations fora sparse matrix in the CSR format:

y := alpha*inv(A)*x

or

y := alpha*inv(A')*x,

where:

238


alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit ornon-unit main diagonal, A' is the transpose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', y := alpha*inv(A)*xIf transa = 'T' or 't' or 'C' or 'c', y :=alpha*inv(A')*x,

INTEGER. Number of columns of the matrix A.m



matdescra

REAL*8. Array containing non-zero elements of the matrixA. Its length is pntre(m) - pntrb(1).

val



indx


INTEGER. Array of length m, contains row indices, such thatpntrb(i) - pntrb(1)+1 is the starting index of row i inthe arrays val and indx. Refer to pointerb array descriptionin CSR Format for more details.

pntrb


pntre

REAL*8.xArray, DIMENSION at least m.

239


Before entry, the array x must contain the vector x. Theelements are accessed with unit increment.

REAL*8.yArray, DIMENSION at least m.Before entry, the array y must contain the vector y. Theelements are accessed with unit increment.

Output Parameters

Contains solution vector x.y

Interfaces

Fortran 77:SUBROUTINE mkl_dcsrsv(transa, m, alpha, matdescra, val, indx, pntrb, pntre,x, y)

CHARACTER*1 transa


INTEGER m


REAL*8 alpha

REAL*8 val(*)

REAL*8 x(*), y(*)

C:void mkl_dcsrsv(char *transa, int *m, double *alpha, char *matdescra, double*val, int *indx, int *pntrb, int *pntre, double *x, double *y);

240


mkl_dcsrtrsvTriangular solvers with simplified interface for asparse matrix in the CSR format (PARDISOvariation).

Syntax

Fortran:

call mkl_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y)

C:

mkl_dcsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y);

Description

The mkl_dcsrtrsv routine solves a system of linear equations with matrix-vector operationsfor a sparse matrix stored in the CSR format accepted in PARDISO:

A*y = x

or

A'*y = x,

where:


A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is thetranspose of A.

Input Parameters



uplo


241


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', A*y = xIf transa = 'T' or 't' or 'C' or 'c', A'*y = x,

CHARACTER*1. Specifies whether A is a unit triangular ornot.

diag

If diag = 'U' or 'u', A is assumed to be a unit triangular.If diag = 'N' or 'n', A is not assumed to be a unittriangular.



a


ia



ja



Output Parameters

REAL*8.yArray, DIMENSION at least m.Contains the vector y.

242


Interfaces

Fortran 77:SUBROUTINE mkl_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y)

CHARACTER*1 uplo, transa, diag

INTEGER m


REAL*8 a(*), x(*), y(*)

C:void mkl_dcsrtrsv(char *uplo, char *transa, char *diag, int *m, double *a,int *ia, int *ja, double *x, double *y);

mkl_dcscsvSolves a system of linear equations for a sparsematrix in the CSC format.

Syntax

Fortran:

call mkl_dcscsv(transa, m, alpha, matdescra, val, indx, pntrb, pntre, x, y)

C:

mkl_dcscsv(&transa, &m, &alpha, matdescra, val, indx, pntrb, pntre, x, y);

Description

The mkl_dcsrsv routine solves a system of linear equations with matrix-vector operations fora sparse matrix in the CSC format:

y := alpha*inv(A)*x

or

y := alpha*inv(A')* x,

where:

243



Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', y := alpha*inv(A)*xIf transa= 'T' or 't' or 'C' or 'c', y :=alpha*inv(A')* x,




matdescra


val


INTEGER. Array containing the column indices for eachnon-zero element of the matrix A. Its length is equal tolength of the val array.

indx

Refer to columns array description in CSC Format for moredetails.

INTEGER. Array of length m, contains row indices, such thatpntrb(i) - pntrb(1)+1 is the starting index of row i inthe arrays val and indx. Refer to pointerb array descriptionin CSC Format for more details.

pntrb

INTEGER. Array of length m, contains row indices, such thatpntre(i) - pntrb(1) is the last index of row i in thearrays val and indx. Refer to pointerE array descriptionin CSC Format for more details.

pntre

REAL*8.xArray, DIMENSION at least m.

244


Before entry, the array x must contain the vector x. Theelements are accessed with unit increment.


Output Parameters

Contains the solution vector x.y

Interfaces

Fortran 77:SUBROUTINE mkl_dcscsv(transa, m, alpha, matdescra, val, indx, pntrb, pntre,x, y)

CHARACTER*1 transa


INTEGER m


REAL*8 alpha

REAL*8 val(*)

REAL*8 x(*), y(*)

C:void mkl_dcscsv(char *transa, int *m, double *alpha, char *matdescra, double*val, int *indx, int *pntrb, int *pntre, double *x, double *y);

245


mkl_dcoosvSolves a system of linear equations for a sparsematrix in the coordinate format.

Syntax

Fortran:

call mkl_dcoosv(transa, m, k, alpha, matdescra, val, rowind, colind, nnz, x,y)

C:

mkl_dcoosv(&transa, &m, &k, &alpha, matdescra, val, rowind, colind, &nnz, x,y);

Description

The mkl_dcoosv routine solves a system of linear equations with matrix-vector operations fora sparse matrix in the coordinate format:

y := alpha*inv(A)*x

or


where:


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', y := alpha*inv(A)*xIf transa = 'T' or 't' or 'C' or 'c', y :=alpha*inv(A')* x,



246



matdescra


val



rowind



colind


nnz


REAL*8.xArray, DIMENSION at least m.Before entry, the array x must contain the vector x. Theelements are accessed with unit increment.


Output Parameters


247


Interfaces

Fortran 77:SUBROUTINE mkl_dcoosv(transa, m, alpha, matdescra, val, rowind, colind, nnz,x, y)

CHARACTER*1 transa


INTEGER m, nnz


REAL*8 alpha

REAL*8 val(*)

REAL*8 x(*), y(*)

C:void mkl_dcoosv(char *transa, int *m, double *alpha, char *matdescra, double*val, int *rowind, int *colind, int *nnz, double *x, double *y);

mkl_dcootrsvTriangular solvers with simplified interface for asparse matrix in the coordinate format.

Syntax

Fortran:

call mkl_dcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y)

C:

mkl_dcootrsv(&uplo, &transa, &diag, &m, val, rowind, colind, &nnz, x, y);

Description

The mkl_dcootrsv routine solves a system of linear equations with matrix-vector operationsfor a sparse matrix stored in the coordinate format:

A*y = x

248


or

A'*y = x,

where:



Input Parameters



uplo


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', A*y = xIf transa = 'T' or 't' or 'C' or 'c', A'*y = x,

CHARACTER*1. Specifies whether or not A is a unit triangularor not.

diag




val



rowind


249



colind


nnz



Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x,y)


INTEGER m, nnz


REAL*8 val(*), x(*), y(*)

C:void mkl_dcootrsv(char *uplo, char *transa, char *diag, int *m, double *alpha,char *matdescra, double *val, int *rowind, int *colind, int *nnz, double*x, double *y);

250


mkl_ddiasvSolves a system of linear equations for a sparsematrix in the diagonal format.

Syntax

Fortran:

call mkl_ddiasv(transa, m, alpha, matdescra, val, lval, idiag, ndiag, x, y)

C:

mkl_ddiasv(&transa, &m, &alpha, matdescra, val, &lval, idiag, &ndiag, x, y);

Description

The mkl_ddiasv routine solves a system of linear equations with matrix-vector operations fora sparse matrix stored in the diagonal format:

y := alpha*inv(A)*x

or

y := alpha*inv(A')* x,

where:


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', y := alpha*inv(A)*xIf transa = 'T' or 't' or 'C' or 'c', y :=alpha*inv(A')*x,



251



matdescra


val

INTEGER. Leading dimension of val, lval≥m. Refer to lval

description in Diagonal Storage Scheme for more details.

lval


idiag



ndiag



Output Parameters


252


Interfaces

Fortran 77:SUBROUTINE mkl_ddiasv(transa, m, alpha, matdescra, val, lval, idiag, ndiag,x, y)

CHARACTER*1 transa



INTEGER indiag(*)

REAL*8 alpha


C:void mkl_ddiasv(char *transa, int *m, double *alpha, char *matdescra, double*val, int *lval, int *idiag, int *ndiag, double *x, double *y);

mkl_ddiatrsvTriangular solvers with simplified interface for asparse matrix in the diagonal format.

Syntax

Fortran:

call mkl_ddiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y)

C:

mkl_ddiatrsv(&uplo, &transa, &diag, &m, val, &lval, idiag, &ndiag, x, y);

Description

The mkl_ddiatrsv routine solves a system of linear equations with matrix-vector operationsfor a sparse matrix stored in the diagonal:

A*y = x

253


or

A'*y = x,

where:



Input Parameters



uplo


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', then A*y = xIf transa = 'T' or 't' or 'C' or 'c', then A'*y = x,

CHARACTER*1. Specifies whether A is a unit triangular ornot.

diag



REAL*8. Two-dimensional array of size lval by ndiag,contains non-zero diagonals of the matrix A. Refer to valuesarray description in Diagonal Storage Schemefor moredetails.

val



lval


idiag

254




ndiag


Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_ddiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x,y)



INTEGER indiag(*)


C:void mkl_ddiatrsv(char *uplo, char *transa, char *diag, int *m, double *val,int *lval, int *idiag, int *ndiag, double *x, double *y);

255


mkl_dskysvSolves a system of linear equations for a sparsematrix in the skyline format.

Syntax

Fortran:

call mkl_dskysv(transa, m, alpha, matdescra, val, pntr, x, y)

C:

mkl_dskysv(&transa, &m, &alpha, matdescra, val, pntr, x, y);

Description

The mkl_dskysv routine solves a system of linear equations with matrix-vector operations fora sparse matrix in the skyline storage format:

y := alpha*inv(A)*x

or


where:


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', then y := alpha*inv(A)*xIf transa = 'T' or 't' or 'C' or 'c', then y :=alpha*inv(A')* x,



256



matdescra


val



pntr




Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dskysv(transa, m, alpha, matdescra, val, pntr, x, y)

CHARACTER*1 transa


257


INTEGER m

INTEGER pntr(*)

REAL*8 alpha

REAL*8 val(*), x(*), y(*)

C:void mkl_dskysv(char *transa, int *m, double *alpha, char *matdescra, double*val, int *pntr, double *x, double *y);

mkl_dcsrmmComputes matrix - matrix product of a sparsematrix stored in the CSR format.

Syntax

Fortran:

call mkl_dcsrmm(transa, m, n, k, alpha, matdescra, val, indx, pntrb, pntre,b, ldb, beta, c, ldc)

C:

mkl_dcsrmm(&transa, &m, &n, &k, &alpha, matdescra, val, indx, pntrb, pntre,b, &ldb, &beta, c, &ldc);

Description

The mkl_dcsrmm routine performs a matrix-matrix operation defined as


or

C := alpha*A'*B + beta*C,

where:


B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse row format, A'is the transpose of A.

258


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*A*B + beta*CIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*A'*B + beta*C,


INTEGER. Number of columns of the matrix C.n




matdescra


val


INTEGER. Array containing the column indices for eachnon-zero element of the matrix A. Its length is equal tolength of the val array.

indx



pntrb


pntre

REAL*8.bArray, DIMENSION (ldb, n).

259


Before entry with transa= 'N' or 'n', the leading k-by-npart of the array b must contain the matrix B, otherwise theleading m-by-n part of the array b must contain the matrixB.

INTEGER. Specifies the first dimension of b as declared inthe calling (sub)program.

ldb


REAL*8.cArray, DIMENSION (ldc, n).Before entry, the leading m-by-n part of the array c mustcontain the matrix C, otherwise the leading k-by-n part ofthe array c must contain the matrix C.

INTEGER. Specifies the first dimension of c as declared inthe calling (sub)program.

ldc

Output Parameters

Overwritten by the matrix (alpha*A*B + beta* C) or(alpha*A'*B + beta*C).

c

Interfaces

Fortran 77:SUBROUTINE mkl_dcsrmm(transa, m, n, k, alpha, matdescra, val, indx, pntrb,pntre, b, ldb, beta, c, ldc)

CHARACTER*1 transa


INTEGER m, n, k, ldb, ldc


REAL*8 alpha, beta

REAL*8 val(*), b(ldb,*), c(ldc,*)

260


C:void mkl_dcsrmm(char *transa, int *m, int *n, int *k, double *alpha, char*matdescra, double *val, int *indx, int *pntrb, int *pntre, double *b, int*ldb, double *beta, double *c, int *ldc,);

mkl_dcscmmComputes matrix-matrix product of a sparse matrixstored in the CSC format.

Syntax

Fortran:

call mkl_dcscmm(transa, m, n, k, alpha, matdescra, val, indx, pntrb, pntre,b, ldb, beta, c, ldc)

C:

mkl_dcscmm(&transa, &m, &n, &k, &alpha, matdescra, val, indx, pntrb, pntre,b, &ldb, &beta, c, &ldc);

Description

The mkl_dcscmm routine performs a matrix-matrix operation defined as


or


where:


B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse column format,A' is the transpose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transa

261


If transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*A* B + beta*CIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*A'*B + beta*C,






matdescra

REAL*8. Array containing non-zero elements of the matrixA. Its length is pntre(k) - pntrb(1).

val


INTEGER. Array containing the row indices for each non-zeroelement of the matrix A.Its length is equal to length of theval array.

indx



pntrb


pntre

REAL*8.bArray, DIMENSION (ldb, n).Before entry with transa = 'N' or 'n', the leading k-by-npart of the array b must contain the matrix B, otherwise theleading m-by-n part of the array b must contain the matrixB.


ldb

262





ldc

Output Parameters

Overwritten by the matrix (alpha*A*B + beta* C) or(alpha*A'*B + beta*C).

c

Interfaces

Fortran 77:SUBROUTINE mkl_dcscmm(transa, m, n, k, alpha, matdescra, val, indx, pntrb,pntre, b, ldb, beta, c, ldc)

CHARACTER*1 transa



INTEGER indx(*), pntrb(k), pntre(k)

REAL*8 alpha, beta


C:void mkl_dcscmm(char *transa, int *m, int *n, int *k, double *alpha, char*matdescra, double *val, int *indx, int *pntrb, int *pntre, double *b, int*ldb, double *beta, double *c, int *ldc);

263


mkl_dcoommComputes matrix-matrix product of a sparse matrixstored in the coordinate format.

Syntax

Fortran:

call mkl_dcoomm(transa, m, n, k, alpha, matdescra, val, rowind, colind, nnz,b, ldb, beta, c, ldc)

C:

mkl_dcoomm(&transa, &m, &n, &k, &alpha, matdescra, val, rowind, colind, &nnz,b, &ldb, &beta, c, &ldc);

Description

The mkl_dcoomm routine performs a matrix-matrix operation defined as


or


where:


B and C are dense matrices, A is an m-by-k sparse matrix in the coordinate format, A' is thetranspose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*A*B + beta*CIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*A'*B + beta*C,


264






matdescra


val



rowind



colind


nnz




ldb



265



ldc

Output Parameters

Overwritten by the matrix (alpha*A*B + beta*C) or(alpha*A'*B + beta*C).

c

Interfaces

Fortran 77:SUBROUTINE mkl_dcoomm(transa, m, n, k, alpha, matdescra, val, rowind, colind,nnz, b, ldb, beta, c, ldc)

CHARACTER*1 transa


INTEGER m, n, k, ldb, ldc, nnz


REAL*8 alpha, beta


C:void mkl_dcoomm(char *transa, int *m, int *n, int *k, double *alpha, char*matdescra, double *val, int *rowind, int *colind, int *nnz, double *b, int*ldb, double *beta, double *c, int *ldc);

mkl_ddiammComputes matrix-matrix product of a sparse matrixstored in the diagonal format.

Syntax

Fortran:

call mkl_ddiamm(transa, m, n, k, alpha, matdescra, val, lval, idiag, ndiag,b, ldb, beta, c, ldc)

266


C:

mkl_ddiamm(&transa, &m, &n, &k, &alpha, matdescra, val, &lval, idiag, &ndiag,b, &ldb, &beta, c, &ldc);

Description

The mkl_ddiamm routine performs a matrix-matrix operation defined as


or


where:


B and C are dense matrices, A is an m-by-k sparse matrix in the diagonal format, A' is thetranspose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*A*B + beta*C,If transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*A'*B + beta*C.






matdescra

267



val

INTEGER. Leading dimension of val, lval≥min(m, k).Refer to lval description in Diagonal Storage Scheme formore details.

lval


idiag



ndiag



ldb




ldc

Output Parameters


c

268


Interfaces

Fortran 77:SUBROUTINE mkl_ddiamm(transa, m, n, k, alpha, matdescra, val, lval, idiag,ndiag, b, ldb, beta, c, ldc)

CHARACTER*1 transa


INTEGER m, n, k, ldb, ldc, lval, ndiag

INTEGER idiag(*)

REAL*8 alpha, beta

REAL*8 val(lval,*), b(ldb,*), c(ldc,*)

C:void mkl_ddiamm(char *transa, int *m, int *n, int *k, double *alpha, char*matdescra, double *val, int *lval, int *idiag, int *ndiag, double *b, int*ldb, double *beta, double *c, int *ldc);

mkl_dskymmComputes matrix-matrix product of a sparse matrixstored using the skyline storage scheme.

Syntax

Fortran:

call mkl_dskymm(transa, m, n, k, alpha, matdescra, val, pntr, b, ldb, beta,c, ldc)

C:

mkl_dskymm(&transa, &m, &n, &k, &alpha, matdescra, val, pntr, b, &ldb, &beta,c, &ldc);

Description

The mkl_dskymm routine performs a matrix-matrix operation defined as


269


or


where:


B and C are dense matrices, A is an m-by-k sparse matrix in the skyline storage format, A' isthe transpose of A.

Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*A*B + beta*C,If transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*A'*B + beta*C,






matdescra


val

If matdescrsa(2)= 'L', then val contains elements fromthe low triangle of the matrix A.If matdescrsa(2)= 'U', then val contains elements fromthe upper triangle of the matrix A.Refer to values array description in Skyline StorageSchemefor more details.


pntr

270





ldb




ldc

Output Parameters


c

Interfaces

Fortran 77:SUBROUTINE mkl_dskymm(transa, m, n, k, alpha, matdescra, val, pntr, b, ldb,beta, c, ldc)

CHARACTER*1 transa


271



INTEGER pntr(*)

REAL*8 alpha, beta


C:void mkl_dskymm(char *transa, int *m, int *n, int *k, double *alpha, char*matdescra, double *val, int *pntr, double *b, int *ldb, double *beta, double*c, int *ldc);

mkl_dcsrsmSolves a system of linear matrix equations for asparse matrix in the CSR format.

Syntax

Fortran:

call mkl_dcsrsm(transa, m, n, alpha, matdescra, val, indx, pntrb, pntre, b,ldb, c, ldc)

C:

mkl_dcsrsm(&transa, &m, &n, &alpha, matdescra, val, indx, pntrb, pntre, b,&ldb, c, &ldc);

Description

The mkl_dcsrsm routine solves a system of linear equations with matrix-matrix operations fora sparse matrix in the CSR format:

C := alpha*inv(A)*B

or

C := alpha*inv(A')*B,

where:

alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix withunit or non-unit main diagonal, A' is the transpose of A.

272


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*inv(A)*BIf transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*inv(A')*B,





matdescra


val



indx



pntrb


pntre

REAL*8.bArray, DIMENSION (ldb, n).

273


Before entry the leading m-by-n part of the array b mustcontain the matrix B.


ldb


ldc

Output Parameters

REAL*8.cArray, DIMENSION (ldc, n).The leading m-by-n part of the array c contains the outputmatrix C.

Interfaces

Fortran 77:SUBROUTINE mkl_dcsrsm(transa, m, n, alpha, matdescra, val, indx, pntrb,pntre, b, ldb, c, ldc)

CHARACTER*1 transa


INTEGER m, n, ldb, ldc


REAL*8 alpha


C:void mkl_dcsrsm(char *transa, int *m, int *n, double *alpha, char *matdescra,double *val, int *indx, int *pntrb, int *pntre, double *b, int *ldb, double*c, int *ldc);

274


mkl_dcscsmSolves a system of linear matrix equations for asparse matrix in the CSC format.

Syntax

Fortran:

call mkl_dcscsm(transa, m, n, alpha, matdescra, val, indx, pntrb, pntre, b,ldb, c, ldc)

C:

mkl_dcscsm(&transa, &m, &n, &alpha, matdescra, val, indx, pntrb, pntre, b,&ldb, c, &ldc);

Description

The mkl_dcscsm routine solves a system of linear equations with matrix-matrix operations fora sparse matrix in the CSC format:

C := alpha*inv(A)*B

or


where:


Input Parameters




275





matdescra


val


INTEGER. Array containing the row indices for each non-zeroelement of the matrix A. Its length is equal to length of theval array.

indx



pntrb


pntre

REAL*8.bArray, DIMENSION (ldb, n).Before entry the leading m-by-n part of the array b mustcontain the matrix B.


ldb


ldc

Output Parameters


276


Interfaces

Fortran 77:SUBROUTINE mkl_dcscsm(transa, m, n, alpha, matdescra, val, indx, pntrb,pntre, b, ldb, c, ldc)

CHARACTER*1 transa




REAL*8 alpha


C:void mkl_dcscsm(char *transa, int *m, int *n, double *alpha, char *matdescra,double *val, int *indx, int *pntrb, int *pntre, double *b, int *ldb, double*c, int *ldc);

mkl_dcoosmSolves a system of linear matrix equations for asparse matrix in the coordinate format.

Syntax

Fortran:

call mkl_dcoosm(transa, m, n, alpha, matdescra, val, rowind, colind, nnz, b,ldb, c, ldc)

C:

mkl_dcoosm(&transa, &m, &n, &alpha, matdescra, val, rowind, colind, &nnz, b,&ldb, c, &ldc);

277


Description

The mkl_dcoosm routine solves a system of linear equations with matrix-matrix operations fora sparse matrix in the coordinate format:

C := alpha*inv(A)*B

or


where:


Input Parameters







matdescra


val



rowind


278



colind


nnz




ldb


ldc

Output Parameters


Interfaces

Fortran 77:SUBROUTINE mkl_dcoosm(transa, m, n, alpha, matdescra, val, rowind, colind,nnz, b, ldb, c, ldc)

CHARACTER*1 transa


INTEGER m, n, ldb, ldc, nnz


REAL*8 alpha


279


C:void mkl_dcoosm(char *transa, int *m, int *n, double *alpha, char *matdescra,double *val, int *rowind, int *colind, int *nnz, double *b, int *ldb, double*c, int *ldc);

mkl_ddiasmSolves a system of linear matrix equations for asparse matrix in the diagonal format.

Syntax

Fortran:

call mkl_ddiasm(transa, m, n, alpha, matdescra, val, lval, idiag, ndiag, b,ldb, c, ldc)

C:

mkl_ddiasm(&transa, &m, &n, &alpha, matdescra, val, &lval, idiag, &ndiag, b,&ldb, c, &ldc);

Description

The mkl_ddiasm routine solves a system of linear equations with matrix-matrix operations fora sparse matrix in the diagonal format:

C := alpha*inv(A)*B

or


where:


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transa

280


If transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*inv(A)*B,If transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*inv(A')*B.





matdescra


val



lval


idiag



ndiag



ldb


ldc

Output Parameters

REAL*8.cArray, DIMENSION (ldc, n).

281


The leading m-by-n part of the array c contains the matrixC.

Interfaces

Fortran 77:SUBROUTINE mkl_ddiasm(transa, m, n, alpha, matdescra, val, lval, idiag,ndiag, b, ldb, c, ldc)

CHARACTER*1 transa


INTEGER m, n, ldb, ldc, lval, ndiag

INTEGER idiag(*)

REAL*8 alpha

REAL*8 val(lval,*), b(ldb,*), c(ldc,*)

C:void mkl_ddiasm(char *transa, int *m, int *n, double *alpha, char *matdescra,double *val, int *lval, int *idiag, int *ndiag, double *b, int *ldb, double*c, int *ldc);

mkl_dskysmSolves a system of linear matrix equations for asparse matrix stored using the skyline storagescheme.

Syntax

Fortran:

call mkl_dskysm(transa, m, n, alpha, matdescra, val, pntr, b, ldb, c, ldc)

C:

mkl_dskysm(&transa, &m, &n, &alpha, matdescra, val, pntr, b, &ldb, c, &ldc);

282


Description

The mkl_dskysm routine solves a system of linear equations with matrix-matrix operations fora sparse matrix in the skyline storage format:

C := alpha*inv(A)*B

or


where:


Input Parameters


CHARACTER*1. Specifies the operation to be performed.transaIf transa = 'N' or 'n', the matrix-matrix product iscomputed as C := alpha*inv(A)*B,If transa = 'T' or 't' or 'C' or 'c', the matrix-vectorproduct is computed as C := alpha*inv(A')*B,





matdescra


val


283


INTEGER. Array of length (m+m). It contains the indicesspecifying in the val the positions of the first non-zeroelement of each i-row (column) of the matrix A such thatpointers(i)- pointers(1)+1. Refer to pointers arraydescription in Skyline Storage Scheme for more details.

pntr



ldb


ldc

Output Parameters

REAL*8.cArray, DIMENSION (ldc, n).The leading m-by-n part of the array c contains the matrixC.

Interfaces

Fortran 77:

SUBROUTINE mkl_dskysm(transa, m, n, alpha, matdescra, val, pntr, b, ldb, c,ldc)

CHARACTER*1 transa



INTEGER pntr(*)

REAL*8 alpha


284


C:void mkl_dskysm(char *transa, int *m, int *n, double *alpha, char *matdescra,double *val, int *pntr, double *b, int *ldb, double *c, int *ldc,);

285


3LAPACK Routines: LinearEquations

This chapter describes the Intel® Math Kernel Library implementation of routines from the LAPACK packagethat are used for solving systems of linear equations and performing a number of related computationaltasks. The library includes LAPACK routines for both real and complex data. Routines are supported forsystems of equations with the following types of matrices:

• general

• banded

• symmetric or Hermitian positive-definite (both full and packed storage)

• symmetric or Hermitian positive-definite banded

• symmetric or Hermitian indefinite (both full and packed storage)

• symmetric or Hermitian indefinite banded

• triangular (both full and packed storage)

• triangular banded

• tridiagonal.

For each of the above matrix types, the library includes routines for performing the following computations:

– factoring the matrix (except for triangular matrices)

– equilibrating the matrix

– solving a system of linear equations

– estimating the condition number of a matrix

– refining the solution of linear equations and computing its error bounds

– inverting the matrix.

To solve a particular problem, you can call two or more computational routines or call a correspondingdriver routine that combines several tasks in one call, such as ?gesv for factoring and solving. For example,to solve a system of linear equations with a general matrix, call ?getrf (LU factorization) and then ?getrs(computing the solution). Then, call ?gerfs to refine the solution and get the error bounds. Alternatively,use the driver routine ?gesvx that performs all these tasks in one call.

WARNING. LAPACK routines expect that input matrices do not contain INF or NaN values.When input data is inappropriate for LAPACK, problems may arise, including possible hangs.

287

Starting from release 8.0, Intel MKL along with Fortran-77 interface to LAPACK computational anddriver routines also supports Fortran-95 interface which uses simplified routine calls with shorterargument lists. The syntax section of the routine description gives the calling sequence for Fortran-95interface immediately after Fortran-77 calls.

Routine Naming ConventionsTo call each routine introduced in this chapter from the Fortran-77 program, you can use theLAPACK name.

LAPACK names are listed in Table 3-1 and Table 3-2 , and have the structure ?yyzzz or?yyzz, which is described below.

The initial symbol ? indicates the data type:





The second and third letters yy indicate the matrix type and storage scheme:

generalge

general bandgb

general tridiagonalgt

symmetric or Hermitian positive-definitepo

symmetric or Hermitian positive-definite (packed storage)pp

symmetric or Hermitian positive-definite bandpb

symmetric or Hermitian positive-definite tridiagonalpt

symmetric indefinitesy

symmetric indefinite (packed storage)sp

Hermitian indefinitehe

Hermitian indefinite (packed storage)hp

triangulartr

triangular (packed storage)tp

triangular bandtb

For computational routines, the last three letters zzz indicate the computation performed:

288


form a triangular matrix factorizationtrf

solve the linear system with a factored matrixtrs

estimate the matrix condition numbercon

refine the solution and compute error boundsrfs

compute the inverse matrix using the factorizationtri

equilibrate a matrix.equ

For example, the sgetrf routine performs the triangular factorization of general real matricesin single precision; the corresponding routine for complex matrices is cgetrf.

For driver routines, the names can end with -sv (meaning a simple driver), or with -svx(meaning an expert driver).

Names of the LAPACK computational and driver routines for Fortran-95 interface in Intel MKLare the same as Fortran-77 names but without the first letter that indicates the data type. Forexample, the name of the routine that performs triangular factorization of general real matricesin Fortran-95 interface is getrf. Different data types are handled through defining a specificinternal parameter that refers to a module block with named constants for single and doubleprecision.

Fortran-95 Interface ConventionsFortran-95 interface to LAPACK is implemented through wrappers that call respective Fortran-77routines. This interface uses such features of Fortran-95 as assumed-shape arrays and optionalarguments to provide simplified calls to LAPACK routines with fewer arguments.

The main conventions for Fortran-95 interface are as follows:

• The names of arguments used in Fortran-95 call are typically the same as for the respectivegeneric (Fortran-77) interface. However, to reduce the number of argument names used inthe library, Fortran-95 interface uses the following identity settings of formal argumentnames:



aap

aab

afafb

289

LAPACK Routines: Linear Equations 3



afafp

bbp

bbb

selectselctg

Note that these name changes of formal arguments have no impact on program semanticsand follow the unification conventions.

• Input arguments such as array dimensions are not required in Fortran-95 and are skippedfrom the calling sequence. Array dimensions are reconstructed from the user data that mustexactly follow the required array shape.

Another type of generic arguments that are skipped in Fortran-95 interface are argumentsthat represent workspace arrays (such as work, rwork, and so on). The only exception arecases when workspace arrays return significant information on output.

An argument can also be skipped if its value is completely defined by the presence or absenceof another argument in the calling sequence, and the restored value is the only meaningfulvalue for the skipped argument.

• Some generic arguments are declared as optional in Fortran-95 interface and may or maynot be present in the calling sequence. An argument can be declared optional if it satisfiesone of the following conditions:

– If the argument value is completely defined by the presence or absence of anotherargument in the calling sequence, it can be declared as optional. The difference from theskipped argument in this case is that the optional argument can have some meaningfulvalues that are distinct from the value reconstructed by default. For example, if someargument (like jobz) can take only two values and one of these values directly impliesthe use of another argument, then the value of jobz can be uniquely reconstructed fromthe actual presence or absence of this second argument, and jobz can be omitted.

– If an input argument can take only a few possible values, it can be declared as optional.The default value of such argument is typically set as the first value in the list and allexceptions to this rule are explicitly stated in the routine description.

– If an input argument has a natural default value, it can be declared as optional. Thedefault value of such optional argument is set to its natural default value.

• Argument info is declared as optional in Fortran-95 interface. If it is present in the callingsequence, the value assigned to info is interpreted as follows:

290


– If this value is more than -1000, its meaning is the same as in Fortran-77 routine.

– If this value is equal to -1000, it means that there is not enough work memory.

– If this value is equal to -1001, incompatible arguments are present in the calling sequence.

• Optional arguments are given in square brackets in Fortran-95 call syntax.

“Fortran-95 Notes” subsection at the end of the topic describing each routine details concreterules for reconstructing the values of omitted optional parameters.

MKL Fortran-95 Interfaces for LAPACK Routines vs. Netlib Implementation

The following list presents general digressions of Intel MKL LAPACK-95 implementation fromthe Netlib analog:

• Intel® MKL Fortran-95 interfaces are provided for pure procedures.

• Names of interfaces do not contain LA_ prefix.

• An optional array argument always has the target attribute.

• Functionality of MKL LAPACK-95 wrapper is close to FORTRAN-77 original implementationin getrf, gbtrf, and potrf interfaces.

• If jobz argument value specifies presence or absence of z argument, then z is alwaysdeclared as optional and jobz is restored depending on whether z is present or not. It isnot always so in the Netlib version (see “Modified Netlib Interfaces” in Appendix E).

• To avoid double error checking, processing of info parameter is limited to checking ofallocated memory and disarranging of optional parameters.

• If an argument that is present in the list of arguments completely defines another argument,the latter is always declared as optional.

You can transform an application that uses the Netlib LAPACK interfaces to ensure its work withIntel MKL interfaces by meeting two conditions:

a. The application is correct, that is, unambiguous, compiler-independent, and contains noerrors.

b. Each routine name denotes only one specific routine. If any routine name in the applicationcoincides with a name of the original Netlib routine (for example, after removing LA_ prefix)but denotes a routine different from that Netlib original routine, this name should be modifiedthrough context replacement.

You should transform your application in the following five cases (see Appendix E for specificdifferences of individual interfaces):

1. When using Netlib routines that differ from the Intel MKL routines only by the LA_ prefix orin the array attribute target. The only transformation required in this case is context namereplacement. See “Interfaces Identical to Netlib” in Appendix E for details.

291


2. When using Netlib routines that differ from the Intel MKL routines by the LA_ prefix, thetarget array attribute, and the names of formal arguments. In the case of positional passingof arguments, no additional transformation except context name replacement is required.In the case of the key passing of arguments, in addition to the context name replacementthe names of mismatching keys should also be modified. See “Interfaces with ReplacedArgument Names” in Appendix E for details.

3. When using Netlib routines that differ from the Intel MKL routines by the LA_ prefix, thetarget array attribute, sequence of the arguments, arguments missing in MKL but presentin Netlib and, vice versa, present in MKL but missing in Netlib. Remove the differences insequence and range of the arguments in process of all the transformations specified in items2 and 3. See “Modified Netlib Interfaces” in Appendix E for details.

4. When using getrf, gbtrf, and potrf interfaces, that is, new functionality implemented inMKL but unavailable in Netlib source. To overcome the differences, build the desiredfunctionality explicitly with MKL means or create a new subroutine with the new functionality,using specific MKL interfaces corresponding to LAPACK-77 routines. You can call the latterroutines directly but using new MKL interfaces is preferable. See “Interfaces Absent FromNetlib” and “Interfaces of New Functionality”in Appendix E for details.

Note that if the transformed application calls getrf, gbtrf or potrf without controllingarguments rcond and norm, just context replacement is enough in modifying the calls intoMKL interfaces, as described in point 1 above. Netlib functionality is preserved in such cases.

5. When using Netlib auxiliary routines. In this case, call a corresponding subroutine directly,using MKL LAPACK-77 interfaces.

You can transform your application as follows:

1. Make sure conditions a. and b. are met.

2. Select Netlib LAPACK-95 calls. For each call do the following:

• Select the case of digression and do the required transformations.• Revise results to eliminate unneeded code or data, which may appear after several

identical calls.

3. Make sure the transformations are correct and complete.

Matrix Storage SchemesLAPACK routines use the following matrix storage schemes:

• Full storage: a matrix A is stored in a two-dimensional array a, with the matrix elementaij stored in the array element a(i,j).

292



• Band storage: an m-by-n band matrix with kl sub-diagonals and ku superdiagonals isstored compactly in a two-dimensional array ab with kl+ku+1 rows and n columns. Columnsof the matrix are stored in the corresponding columns of the array, and diagonals of thematrix are stored in rows of the array.

In Chapters 4 and 5 , arrays that hold matrices in packed storage have names ending in p;arrays with matrices in band storage have names ending in b.

For more information on matrix storage schemes, see “Matrix Arguments” in Appendix B.

Mathematical NotationDescriptions of LAPACK routines use the following notation:

A system of linear equations with an n-by-n matrix A = {aij},a right-hand side vector b = {bi}, and an unknown vectorx = {xi}.

Ax = b

A set of systems with a common matrix A and multipleright-hand sides. The columns of B are individual right-handsides, and the columns of X are the corresponding solutions.

AX = B

the vector with elements |xi| (absolute values of xi).|x|

the matrix with elements |aij| (absolute values of aij).|A|

The infinity-norm of the vector x.||x||∞ = maxi|xi|

The infinity-norm of the matrix A.||A||∞ = maxiΣj|aij|

The one-norm of the matrix A. ||A||1 = ||AT||∞ = ||AH||∞||A||1 = maxjΣi|aij|

The condition number of the matrix A.κ(A) = ||A|| ||A-1||

Error AnalysisIn practice, most computations are performed with rounding errors. Besides, you often needto solve a system Ax = b, where the data (the elements of A and b) are not known exactly.Therefore, it is important to understand how the data errors and rounding errors can affect thesolution x.

293


Data perturbations. If x is the exact solution of Ax = b, and x + δx is the exact solution of

a perturbed problem (A + δA)x = (b + δb), then

where

In other words, relative errors in A or b may be amplified in the solution vector x by a factor

κ(A) = ||A|| ||A-1|| called the condition number of A.

Rounding errors have the same effect as relative perturbations c(n)ε in the original data.

Here ε is the machine precision, and c(n) is a modest function of the matrix order n. Thecorresponding solution error is

||δx||/||x||≤ c(n)κ(A)ε. (The value of c(n) is seldom greater than 10n.)

Thus, if your matrix A is ill-conditioned (that is, its condition number κ(A) is very large),then the error in the solution x is also large; you may even encounter a complete loss of

precision. LAPACK provides routines that allow you to estimate κ(A) (see Routines for Estimatingthe Condition Number) and also give you a more precise estimate for the actual solution error(see Refining the Solution and Estimating Its Error).

Computational RoutinesTable 3-1 lists the LAPACK computational routines (Fortran-77 and Fortran-95 interfaces) forfactorizing, equilibrating, and inverting real matrices, estimating their condition numbers,solving systems of equations with real matrices, refining the solution, and estimating its error.Table 3-2 lists similar routines for complex matrices. Respective routine names in Fortran-95interface are without the first symbol (see Routine Naming Conventions).

294


Table 3-1 Computational Routines for Systems of Equations with Real Matrices

Invert matrixEstimateerror

Conditionnumber

Solvesystem

Equilibratematrix

Factorizematrix

Matrix type,storage scheme

?getri?gerfs?gecon?getrs?geequ?getrfgeneral

?gbrfs?gbcon?gbtrs?gbequ?gbtrfgeneral band

?gtrfs?gtcon?gttrs?gttrfgeneraltridiagonal

?potri?porfs?pocon?potrs?poequ?potrfsymmetricpositive-definite

?pptri?pprfs?ppcon?pptrs?ppequ?pptrfsymmetricpositive-definite,packed storage

?pbrfs?pbcon?pbtrs?pbequ?pbtrfsymmetricpositive-definite,band

?ptrfs?ptcon?pttrs?pttrfsymmetricpositive-definite,tridiagonal

?sytri?syrfs?sycon?sytrs?sytrfsymmetricindefinite

?sptri?sprfs?spcon?sptrs?sptrfsymmetricindefinite, packedstorage

?trtri?trrfs?trcon?trtrstriangular

?tptri?tprfs?tpcon?tptrstriangular,packed storage

?tbrfs?tbcon?tbtrstriangular band

In the table above, ? denotes s (single precision) or d (double precision) for Fortran-77 interface.

295


Table 3-2 Computational Routines for Systems of Equations with Complex Matrices


Conditionnumber

Solvesystem

Equilibratematrix

Factorizematrix


?getri?gerfs?gecon?getrs?geequ?getrfgeneral

?gbrfs?gbcon?gbtrs?gbequ?gbtrfgeneral band

?gtrfs?gtcon?gttrs?gttrfgeneraltridiagonal

?potri?porfs?pocon?potrs?poequ?potrfHermitianpositive-definite

?pptri?pprfs?ppcon?pptrs?ppequ?pptrfHermitianpositive-definite,packed storage

?pbrfs?pbcon?pbtrs?pbequ?pbtrfHermitianpositive-definite,band

?ptrfs?ptcon?pttrs?pttrfHermitianpositive-definite,tridiagonal

?hetri?herfs?hecon?hetrs?hetrfHermitianindefinite

?sytri?syrfs?sycon?sytrs?sytrfsymmetricindefinite

?hptri?hprfs?hpcon?hptrs?hptrfHermitianindefinite, packedstorage

?sptri?sprfs?spcon?sptrs?sptrfsymmetricindefinite, packedstorage

?trtri?trrfs?trcon?trtrstriangular

?tptri?tprfs?tpcon?tptrstriangular,packed storage

296



Conditionnumber

Solvesystem

Equilibratematrix

Factorizematrix


?tbrfs?tbcon?tbtrstriangular band

In the table above, ? stands for c (single precision complex) or z (double precision complex)for Fortran-77 interface.

Routines for Matrix Factorization

This section describes the LAPACK routines for matrix factorization. The following factorizationsare supported:

• LU factorization

• Cholesky factorization of real symmetric positive-definite matrices

• Cholesky factorization of Hermitian positive-definite matrices

• Bunch-Kaufman factorization of real and complex symmetric matrices

• Bunch-Kaufman factorization of Hermitian matrices.

You can compute:

• the LU factorization using full and band storage of matrices

• the Cholesky factorization using full, packed, and band storage

• the Bunch-Kaufman factorization using full and packed storage.

?getrfComputes the LU factorization of a general m-by-nmatrix.

Syntax

Fortran 77:

call sgetrf( m, n, a, lda, ipiv, info )

call dgetrf( m, n, a, lda, ipiv, info )

call cgetrf( m, n, a, lda, ipiv, info )

call zgetrf( m, n, a, lda, ipiv, info )

297


Fortran 95:

call getrf( a [,ipiv] [,info] )

Description

The routine computes the LU factorization of a general m-by-n matrix A as

A = P*L*U,

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lowertrapezoidal if m > n) and U is upper triangular (upper trapezoidal if m < n). Usually A is square(m = n), and both L and U are triangular. The routine uses partial pivoting, with row interchanges.

Input Parameters

INTEGER. The number of rows in the matrix A (m ≥ 0).m

INTEGER. The number of columns in A; n ≥ 0.n

REAL for sgetrfaDOUBLE PRECISION for dgetrfCOMPLEX for cgetrfDOUBLE COMPLEX for zgetrf.Array, DIMENSION (lda,*). Contains the matrix A. The seconddimension of a must be at least max(1, n).

INTEGER. The first dimension of array a.lda

Output Parameters

Overwritten by L and U. The unit diagonal elements of L are not stored.a

INTEGER.ipivArray, DIMENSION at least max(1,min(m, n)). The pivot indices: rowi was interchanged with row ipiv(i).

INTEGER. If info=0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, uii is 0. The factorization has been completed, but U isexactly singular. Division by 0 will occur if you use the factor U forsolving a system of linear equations.

298

Routines in Fortran 95 interface have fewer arguments in the calling sequence than their Fortran77 counterparts. For general conventions applied to skip redundant or reconstructible arguments,see Fortran-95 Interface Conventions.

Specific details for the routine getrf interface are as follows:


Holds the vector of length min(m,n).ipiv

Application Notes

The computed L and U are the exact factors of a perturbed matrix A + E, where

|E| ≤ c(min(m,n))ε P|L||U|

c(n) is a modest linear function of n, and ε is the machine precision.

The approximate number of floating-point operations for real flavors is

If m = n,(2/3)n3

If m > n,(1/3)n2(3m-n)

If m < n.(1/3)m2(3n-m)

The number of operations for complex flavors is four times greater.

After calling this routine with m = n, you can call the following:

to solve A*x = B or ATX = B or AHX = B?getrs

to estimate the condition number of A?gecon

to compute the inverse of A.?getri

299

?gbtrfComputes the LU factorization of a general m-by-nband matrix.

Syntax

Fortran 77:

call sgbtrf( m, n, kl, ku, ab, ldab, ipiv, info )

call dgbtrf( m, n, kl, ku, ab, ldab, ipiv, info )

call cgbtrf( m, n, kl, ku, ab, ldab, ipiv, info )

call zgbtrf( m, n, kl, ku, ab, ldab, ipiv, info )

Fortran 95:

call gbtrf( a [,kl] [,m] [,ipiv] [,info] )

Description

The routine forms the LU factorization of a general m-by-n band matrix A with kl non-zerosubdiagonals and ku non-zero superdiagonals. Usually A is square (m = n), and then

A = P*L*U,

where P is a permutation matrix; L is lower triangular with unit diagonal elements and at mostkl non-zero elements in each column; U is an upper triangular band matrix with kl + kusuperdiagonals. The routine uses partial pivoting, with row interchanges (which creates theadditional kl superdiagonals in U).

Input Parameters

INTEGER. The number of rows in matrix A (m ≥ 0).m

INTEGER. The number of columns in matrix A; n ≥ 0.n

INTEGER. The number of subdiagonals within the band of A; kl ≥ 0.kl

INTEGER. The number of superdiagonals within the band of A; ku ≥ 0.ku

REAL for sgbtrfabDOUBLE PRECISION for dgbtrfCOMPLEX for cgbtrfDOUBLE COMPLEX for zgbtrf.

300


Array, DIMENSION (ldab,*). The array ab contains the matrix A inband storage (see Matrix Storage Schemes). The second dimension ofab must be at least max(1, n).

INTEGER. The first dimension of the array ab. (ldab ≥ 2*kl + ku + 1)ldab

Output Parameters

Overwritten by L and U. The diagonal and kl + ku superdiagonals ofU are stored in the first 1 + kl + ku rows of ab. The multipliers usedto form L are stored in the next kl rows.

ab

INTEGER.ipivArray, DIMENSION at least max(1,min(m, n)). The pivot indices: rowi was interchanged with row ipiv(i).

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, uii is 0. The factorization has been completed, but U isexactly singular. Division by 0 will occur if you use the factor U forsolving a system of linear equations.



Specific details for the routine gbtrf interface are as follows:

Stands for argument ab in Fortan 77 interface. Holds the array A of size(2*kl+ku+1,n).

a

Holds the vector of length min(m,n).ipiv

If omitted, assumed kl = ku.kl

Restored as ku = lda-2*kl-1.ku

If omitted, assumed m = n.m

Application Notes

The computed L and U are the exact factors of a perturbed matrix A + E, where

|E| ≤ c(kl+ku+1) ε P|L||U|

301


c(k) is a modest linear function of k, and ε is the machine precision.

The total number of floating-point operations for real flavors varies between approximately2n(ku+1)kl and 2n(kl+ku+1)kl. The number of operations for complex flavors is four timesgreater. All these estimates assume that kl and ku are much less than min(m,n).

After calling this routine with m = n, you can call the following routines:

to solve A*X = B or AT*X = B or AH*X = Bgbtrs

to estimate the condition number of A.gbcon

?gttrfComputes the LU factorization of a tridiagonalmatrix.

Syntax

Fortran 77:

call sgttrf( n, dl, d, du, du2, ipiv, info )

call dgttrf( n, dl, d, du, du2, ipiv, info )

call cgttrf( n, dl, d, du, du2, ipiv, info )

call zgttrf( n, dl, d, du, du2, ipiv, info )

Fortran 95:

call gttrf( dl, d, du, du2 [, ipiv] [,info] )

Description

The routine computes the LU factorization of a real or complex tridiagonal matrix A in the form

A = P*L*U,

where P is a permutation matrix; L is lower bidiagonal with unit diagonal elements; and U isan upper triangular matrix with nonzeroes in only the main diagonal and first two superdiagonals.The routine uses elimination with partial pivoting and row interchanges.

Input Parameters

INTEGER. The order of the matrix A; n ≥ 0.n

REAL for sgttrfdl, d, du

302


DOUBLE PRECISION for dgttrfCOMPLEX for cgttrfDOUBLE COMPLEX for zgttrf.Arrays containing elements of A.The array dl of dimension (n - 1) contains the subdiagonal elementsof A.The array d of dimension n contains the diagonal elements of A.The array du of dimension (n - 1) contains the superdiagonal elementsof A.

Output Parameters

Overwritten by the (n-1) multipliers that define the matrix L from theLU factorization of A.

dl

Overwritten by the n diagonal elements of the upper triangular matrixU from the LU factorization of A.

d

Overwritten by the (n-1) elements of the first superdiagonal of U.du

REAL for sgttrfdu2DOUBLE PRECISION for dgttrfCOMPLEX for cgttrfDOUBLE COMPLEX for zgttrf.Array, dimension (n-2). On exit, du2 contains (n-2) elements of thesecond superdiagonal of U.

INTEGER.ipivArray, dimension (n). The pivot indices: row i was interchanged withrow ipiv(i).

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, uii is 0. The factorization has been completed, but U isexactly singular. Division by zero will occur if you use the factor U forsolving a system of linear equations.



Specific details for the routine gttrf interface are as follows:

303


Holds the vector of length (n-1).dl

Holds the vectror of length (n).d

Holds the vector of length (n-1).du

Holds the vector of length (n-2).du2

Holds the vector of length (n).ipiv

Application Notes

to solve A*X = B or AT*X = B or AH*X = B?gbtrs

to estimate the condition number of A.?gbcon

?potrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite matrix.

Syntax

Fortran 77:

call spotrf( uplo, n, a, lda, info )

call dpotrf( uplo, n, a, lda, info )

call cpotrf( uplo, n, a, lda, info )

call zpotrf( uplo, n, a, lda, info )

Fortran 95:

call potrf( a [, uplo] [,info] )

Description

This routine forms the Cholesky factorization of a symmetric positive-definite or, for complexdata, Hermitian positive-definite matrix A:

if uplo='U'A = UH*U

if uplo='L',A = L*LH

where L is a lower triangular matrix and U is upper triangular.

304


Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array a stores the upper triangular part of the matrixA, and A is factored as UH*U.If uplo = 'L', the array a stores the lower triangular part of the matrixA; A is factored as L*LH.

INTEGER. The order of matrix A; n ≥ 0.n

REAL for spotrfaDOUBLE PRECISION for dpotrfCOMPLEX for cpotrfDOUBLE COMPLEX for zpotrf.Array, DIMENSION (lda,*). The array a contains either the upper orthe lower triangular part of the matrix A (see uplo). The seconddimension of a must be at least max(1, n).

INTEGER. The first dimension of a.lda

Output Parameters

The upper or lower triangular part of a is overwritten by the Choleskyfactor U or L, as specified by uplo.

a

INTEGER. If info=0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive-definite, and the factorization could not becompleted. This may indicate an error in forming the matrix A.



Specific details for the routine potrf interface are as follows:

Holds the matrix A of size (n, n).a


305


Application Notes

If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where


A similar estimate holds for uplo = 'L'.

The total number of floating-point operations is approximately (1/3)n3 for real flavors or(4/3)n3 for complex flavors.

After calling this routine, you can call the following routines:

to solve A*X = B?potrs

to estimate the condition number of A?pocon

to compute the inverse of A.?potri

?pptrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite matrixusing packed storage.

Syntax

Fortran 77:

call spptrf( uplo, n, ap, info )

call dpptrf( uplo, n, ap, info )

call cpptrf( uplo, n, ap, info )

call zpptrf( uplo, n, ap, info )

Fortran 95:

call pptrf( a [, uplo] [,info] )

306


Description

This routine forms the Cholesky factorization of a symmetric positive-definite or, for complexdata, Hermitian positive-definite packed matrix A:

if uplo='U'A = UH*U

if uplo='L',A = L*LH


Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is packed inthe array ap, and how A is factored:If uplo = 'U', the array ap stores the upper triangular part of thematrix A, and A is factored as UH*U.If uplo = 'L', the array ap stores the lower triangular part of the matrixA; A is factored as L*LH.


REAL for spptrfapDOUBLE PRECISION for dpptrfCOMPLEX for cpptrfDOUBLE COMPLEX for zpptrf.Array, DIMENSION at least max(1, n(n+1)/2). The array ap containseither the upper or the lower triangular part of the matrix A (as specifiedby uplo) in packed storage (see Matrix Storage Schemes).

Output Parameters

The upper or lower triangular part of A in packed storage is overwrittenby the Cholesky factor U or L, as specified by uplo.

ap


307




Specific details for the routine pptrf interface are as follows:

Stands for argument ap in Fortan 77 interface. Holds the array A ofsize (n*(n+1)/2).

a


Application Notes




The total number of floating-point operations is approximately (1/3)n3 for real flavors and(4/3)n3 for complex flavors.


to solve A*X = B?pptrs

to estimate the condition number of A?ppcon

to compute the inverse of A.?pptri

308


?pbtrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite bandmatrix.

Syntax

Fortran 77:

call spbtrf( uplo, n, kd, ab, ldab, info )

call dpbtrf( uplo, n, kd, ab, ldab, info )

call cpbtrf( uplo, n, kd, ab, ldab, info )

call zpbtrf( uplo, n, kd, ab, ldab, info )

Fortran 95:

call pbtrf( a [, uplo] [,info] )

Description

This routine forms the Cholesky factorization of a symmetric positive- definite or, for complexdata, Hermitian positive-definite band matrix A:

if uplo='U'A = UT*U for realdata, A = UH*U forcomplex data

if uplo='L',A = L*LT for realdata, A = L*LH forcomplex data


Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored inthe array ab, and how A is factored:If uplo = 'U', the upper triangle of A is stored.If uplo = 'L', the lower triangle of A is stored.


309


INTEGER. The number of superdiagonals or subdiagonals in the matrix

A(kd ≥ 0).

kd

REAL for spbtrfabDOUBLE PRECISION for dpbtrfCOMPLEX for cpbtrfDOUBLE COMPLEX for zpbtrf.Array, DIMENSION (,*). The array ap contains either the upper or thelower triangular part of the matrix A (as specified by uplo) in bandstorage (see Matrix Storage Schemes). The second dimension of abmust be at least max(1, n).

INTEGER. The first dimension of the array ab. (ldab ≥ kd + 1)ldab

Output Parameters

The upper or lower triangular part of A (in band storage) is overwrittenby the Cholesky factor U or L, as specified by uplo.

ap




Specific details for the routine pbtrf interface are as follows:

Stands for argument ab in Fortan 77 interface. Holds the array A of size(kd+1,n).

a


Application Notes


310




The total number of floating-point operations for real flavors is approximately n(kd+1)2. Thenumber of operations for complex flavors is 4 times greater. All these estimates assume thatkd is much less than n.


to solve A*X = B?pbtrs

to estimate the condition number of A.?pbcon

?pttrfComputes the factorization of a symmetric(Hermitian) positive-definite tridiagonal matrix.

Syntax

Fortran 77:

call spttrf( n, d, e, info )

call dpttrf( n, d, e, info )

call cpttrf( n, d, e, info )

call zpttrf( n, d, e, info )

Fortran 95:

call pttrf( d, e [,info] )

Description

This routine forms the factorization of a symmetric positive-definite or, for complex data,Hermitian positive-definite tridiagonal matrix A:

A = L*D*L',

where D is diagonal and L is unit lower bidiagonal. The factorization may also be regarded ashaving the form A = U'*D*U, where D is unit upper bidiagonal.

311


Input Parameters


REAL for spttrf, cpttrfdDOUBLE PRECISION for dpttrf, zpttrf.Array, dimension (n). Contains the diagonal elements of A.

REAL for spttrfeDOUBLE PRECISION for dpttrfCOMPLEX for cpttrfDOUBLE COMPLEX for zpttrf. Array, dimension (n -1). Contains thesubdiagonal elements of A.

Output Parameters

Overwritten by the n diagonal elements of the diagonal matrix D fromthe L*D*L' factorization of A.

d

Overwritten by the (n - 1) off-diagonal elements of the unit bidiagonalfactor L or U from the factorization of A.

e

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive-definite; if i < n, the factorization could not becompleted, while if i = n, the factorization was completed, but d(n)

≤ 0.



Specific details for the routine pttrf interface are as follows:

Holds the vector of length (n).d

Holds the vector of length (n-1).e

312

?sytrfComputes the Bunch-Kaufman factorization of asymmetric matrix.

Syntax

Fortran 77:

call ssytrf( uplo, n, a, lda, ipiv, work, lwork, info )

call dsytrf( uplo, n, a, lda, ipiv, work, lwork, info )

call csytrf( uplo, n, a, lda, ipiv, work, lwork, info )

call zsytrf( uplo, n, a, lda, ipiv, work, lwork, info )

Fortran 95:

call sytrf( a [, uplo] [,ipiv] [, info] )

Description

This routine forms the Bunch-Kaufman factorization of a symmetric matrix:

if uplo='U', A = P*U*D*UT*PT

if uplo='L', A = P*L*D*LT*PT,

where A is the input matrix, P is a permutation matrix, U and L are upper and lower triangularmatrices with unit diagonal, and D is a symmetric block-diagonal matrix with 1-by-1 and 2-by-2diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocksof D.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array a stores the upper triangular part of thematrix A, and A is factored as P*U*D*UT*PT.If uplo = 'L', the array a stores the lower triangular part of the matrixA, and A is factored as P*L*D*LT*PT.


313


REAL for ssytrfaDOUBLE PRECISION for dsytrfCOMPLEX for csytrfDOUBLE COMPLEX for zsytrf.Array, DIMENSION (lda,*). The array a contains either the upper orthe lower triangular part of the matrix A (see uplo). The seconddimension of a must be at least max(1, n).

INTEGER. The first dimension of a; at least max(1, n).lda

Same type as a. A workspace array, dimension at least max(1,lwork).work

INTEGER. The size of the work array (lwork ≥ n).lwork

If lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

The upper or lower triangular part of a is overwritten by details of theblock-diagonal matrix D and the multipliers used to obtain the factor U(or L).

a

If info=0, on exit work(1) contains the minimum value of lworkrequired for optimum performance. Use this lwork for subsequent runs.

work(1)

INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D. If ipiv(i) = k >0, thendii is a 1-by-1 block, and the i-th row and column of A wasinterchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.

314

If info = i, dii is 0. The factorization has been completed, but D isexactly singular. Division by 0 will occur if you use D for solving a systemof linear equations.



Specific details for the routine sytrf interface are as follows:

holds the matrix A of size (n, n)a

holds the vector of length (n)ipiv

must be 'U' or 'L'. The default value is 'U'.uplo

Application Notes

For better performance, try using lwork = n*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.

If you are in doubt how much workspace to supply, use a generous value of lwork for the firstrun or set lwork = -1.

If you choose the first option and set any of admissible lwork sizes, which is no less than theminimal value described, the routine completes the task, though probably not so fast as witha recommended workspace, and provides the recommended workspace in the first element ofthe corresponding array work on exit. Use this value (work(1)) for subsequent runs.

If you set lwork = -1, the routine returns immediately and provides the recommendedworkspace in the first element of the corresponding array (work). This operation is called aworkspace query.

Note that if you set lwork to less than the minimal required value and not -1, the routinereturns immediately with an error exit and does not provide any information on the recommendedworkspace.

The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. Theremaining elements of U and L are stored in the corresponding columns of the array a, butadditional row interchanges are required to recover U or L explicitly (which is seldom necessary).

If ipiv(i) = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly inthe corresponding elements of the array a.

315


If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A +E, where

|E| ≤ c(n)ε P|U||D||UT|PT

c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holdsfor the computed L and D when uplo = 'L'.



to solve A*X = B?sytrs

to estimate the condition number of A?sycon

to compute the inverse of A.?sytri

?hetrfComputes the Bunch-Kaufman factorization of acomplex Hermitian matrix.

Syntax

Fortran 77:

call chetrf( uplo, n, a, lda, ipiv, work, lwork, info )

call zhetrf( uplo, n, a, lda, ipiv, work, lwork, info )

Fortran 95:

call hetrf( a [, uplo] [,ipiv] [,info] )

Description

This routine forms the Bunch-Kaufman factorization of a Hermitian matrix:

if uplo='U', A = P*U*D*UH*PT

if uplo='L', A = P*L*D*LH*PT,

316


where A is the input matrix, P is a permutation matrix, U and L are upper and lower triangularmatrices with unit diagonal, and D is a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocksof D.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array a stores the upper triangular part of thematrix A, and A is factored as P*U*D*UH*PT.If uplo = 'L', the array a stores the lower triangular part of the matrixA, and A is factored as P*L*D*LH*PT.


COMPLEX for chetrfa, workDOUBLE COMPLEX for zhetrf.Arrays, DIMENSION a(lda,*), work(*).The array a contains the upper or the lower triangular part of the matrixA (see uplo). The second dimension of a must be at least max(1, n).work(*) is a workspace array of dimension at least max(1, lwork).



If lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

The upper or lower triangular part of a is overwritten by details of theblock-diagonal matrix D and the multipliers used to obtain the factor U(or L).

a

If info = 0, on exit work(1) contains the minimum value of lworkrequired for optimum performance. Use this lwork for subsequent runs.

work(1)

INTEGER.ipiv

317


Array, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D. If ipiv(i) = k > 0, thendii is a 1-by-1 block, and the i-th row and column of A wasinterchanged with the k-th row and column.If uplo = 'U' and ipiv(i) = ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and the (i-1)-th row and column ofA was interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) = ipiv(i+1) = -m < 0, then D has a2-by-2 block in rows/columns i and i+1, and the (i+1)-th row andcolumn of A was interchanged with the m-th row and column.

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, dii is 0. The factorization has been completed, but D isexactly singular. Division by 0 will occur if you use D for solving a systemof linear equations.



Specific details for the routine hetrf interface are as follows:

holds the matrix A of size (n, n)a

holds the vector of length (n)ipiv


Application Notes

This routine is suitable for Hermitian matrices that are not known to be positive-definite. If Ais in fact positive-definite, the routine does not perform interchanges, and no 2-by-2 diagonalblocks occur in D.



318





If ipiv(i) = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly inthe corresponding elements of the array a.




A similar estimate holds for the computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (4/3)n3.


to solve A*X = B?hetrs

to estimate the condition number of A?hecon

to compute the inverse of A.?hetri

319


?sptrfComputes the Bunch-Kaufman factorization of asymmetric matrix using packed storage.

Syntax

Fortran 77:

call ssptrf( uplo, n, ap, ipiv, info )

call dsptrf( uplo, n, ap, ipiv, info )

call csptrf( uplo, n, ap, ipiv, info )

call zsptrf( uplo, n, ap, ipiv, info )

Fortran 95:

call sptrf( a [,uplo] [,ipiv] [,info] )

Description

This routine forms the Bunch-Kaufman factorization of a symmetric matrix A using packedstorage:

if uplo='U', A = P*U*D*UT*PT

if uplo='L', A = P*L*D*LT*PT,

where P is a permutation matrix, U and L are upper and lower triangular matrices with unitdiagonal, and D is a symmetric block-diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks.U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of D.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is packed inthe array ap and how A is factored:If uplo = 'U', the array ap stores the upper triangular part of thematrix A, and A is factored as P*U*D*UT*PT.If uplo = 'L', the array ap stores the lower triangular part of thematrix A, and A is factored as P*L*D*LT*PT.


320


REAL for ssptrfapDOUBLE PRECISION for dsptrfCOMPLEX for csptrfDOUBLE COMPLEX for zsptrf.Array, DIMENSION at least max(1, n(n+1)/2). The array ap containsthe upper or the lower triangular part of the matrix A (as specified byuplo) in packed storage (see Matrix Storage Schemes).

Output Parameters

The upper or lower triangle of A (as specified by uplo) is overwrittenby details of the block-diagonal matrix D and the multipliers used toobtain the factor U (or L).

ap

INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D. If ipiv(i) = k > 0, thendii is a 1-by-1 block, and the i-th row and column of A wasinterchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and the (i-1)-th row and column ofA was interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and the (i+1)-th row and column ofA was interchanged with the m-th row and column.




Specific details for the routine sptrf interface are as follows:

stands for argument ap in Fortan 77 interface. Holds the array A of size(n*(n+1)/2).

a

321

holds the vector of length (n).ipiv


Application Notes

The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. Theremaining elements of U and L overwrite elements of the corresponding columns of the matrixA, but additional row interchanges are required to recover U or L explicitly (which is seldomnecessary).

If ipiv(i) = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitlyin packed form.



c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holdsfor the computed L and D when uplo = 'L'.



to solve A*X = B?sptrs

to estimate the condition number of A?spcon

to compute the inverse of A.?sptri

?hptrfComputes the Bunch-Kaufman factorization of acomplex Hermitian matrix using packed storage.

Syntax

Fortran 77:

call chptrf( uplo, n, ap, ipiv, info )

call zhptrf( uplo, n, ap, ipiv, info )

322


Fortran 95:

call hptrf( a [,uplo] [,ipiv] [,info] )

Description

This routine forms the Bunch-Kaufman factorization of a Hermitian matrix using packed storage:

if uplo='U', A = P*U*D*UH*PT

if uplo='L', A = P*L*D*LH*PT,

where A is the input matrix, P is a permutation matrix, U and L are upper and lower triangularmatrices with unit diagonal, and D is a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocksof D.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is packed andhow A is factored:If uplo = 'U', the array ap stores the upper triangular part of thematrix A, and A is factored as P*U*D*UH*PT.If uplo = 'L', the array ap stores the lower triangular part of thematrix A, and A is factored as P*L*D*LH*PT.


COMPLEX for chptrfapDOUBLE COMPLEX for zhptrf.Array, DIMENSION at least max(1, n(n+1)/2). The array ap containsthe upper or the lower triangular part of the matrix A (as specified byuplo) in packed storage (see Matrix Storage Schemes).

Output Parameters

The upper or lower triangle of A (as specified by uplo) is overwrittenby details of the block-diagonal matrix D and the multipliers used toobtain the factor U (or L).

ap

INTEGER.ipiv

323


Array, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D. If ipiv(i) = k > 0, thendii is a 1-by-1 block, and the i-th row and column of A wasinterchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and the (i-1)-th row and column ofA was interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and the (i+1)-th row and column ofA was interchanged with the m-th row and column.




Specific details for the routine hptrf interface are as follows:

Stands for argument ap in Fortan 77 interface. Holds the array A of size(n*(n+1)/2).

a



Application Notes


If ipiv(i) = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitlyin the corresponding elements of the array a.



324


A similar estimate holds for the computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (4/3)n3.


to solve A*X = B?hptrs

to estimate the condition number of A?hpcon

to compute the inverse of A.?hptri

Routines for Solving Systems of Linear Equations

This section describes the LAPACK routines for solving systems of linear equations. Beforecalling most of these routines, you need to factorize the matrix of your system of equations(see Routines for Matrix Factorization in this chapter). However, the factorization is not necessaryif your system of equations has a triangular matrix.

?getrsSolves a system of linear equations with anLU-factored square matrix, with multiple right-handsides.

Syntax

Fortran 77:

call sgetrs( trans, n, nrhs, a, lda, ipiv, b, ldb, info )

call dgetrs( trans, n, nrhs, a, lda, ipiv, b, ldb, info )

call cgetrs( trans, n, nrhs, a, lda, ipiv, b, ldb, info )

call zgetrs( trans, n, nrhs, a, lda, ipiv, b, ldb, info )

Fortran 95:

call getrs( a, ipiv, b [, trans] [,info] )

Description

This routine solves for X the following systems of linear equations:

if trans='N',A*X = B

325


if trans='T',AT*X = B

if trans='C' (for complex matrices only).AH*X = B

Before calling this routine, you must call ?getrf to compute the LU factorization of A.

Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then A*X = B is solved for X.If trans = 'T', then AT*X = B is solved for X.If trans = 'C', then AH*X = B is solved for X.

INTEGER. The order of A; the number of rows in B(n ≥ 0).n

INTEGER. The number of right-hand sides; nrhs ≥ 0.nrhs

REAL for sgetrsa, bDOUBLE PRECISION for dgetrsCOMPLEX for cgetrsDOUBLE COMPLEX for zgetrs.Arrays: a(lda,*), b(ldb,*).The array a contains LU factorization of matrix A resulting from the callof ?getrf .The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n), the seconddimension of b at least max(1,nrhs).

INTEGER. The first dimension of a; lda ≥ max(1, n).lda

INTEGER. The first dimension of b; ldb ≥ max(1, n).ldb

INTEGER.ipivArray, DIMENSION at least max(1, n). The ipiv array, as returned by?getrf.

Output Parameters



326




Specific details for the routine getrs interface are as follows:


Holds the matrix B of size (n, nrhs).b


Must be 'N', 'C', or 'T'. The default value is 'N'.trans

Application Notes

For each right-hand side b, the computed solution is the exact solution of a perturbed systemof equations (A + E)x = b, where

|E| ≤ c(n)ε P|L||U|


If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞ ≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH

might or might not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 2n2

for real flavors and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?gecon.

To refine the solution and estimate the error, call ?gerfs.

327


?gbtrsSolves a system of linear equations with anLU-factored band matrix, with multiple right-handsides.

Syntax

Fortran 77:

call sgbtrs( trans, n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call dgbtrs( trans, n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call cgbtrs( trans, n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call zgbtrs( trans, n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

Fortran 95:

call gbtrs( a, b, ipiv, [, kl] [, trans] [, info] )

Description

This routine solves for X the following systems of linear equations:




Here A is an LU-factored general band matrix of order n with kl non-zero subdiagonals and kunonzero superdiagonals. Before calling this routine, call ?gbtrf to compute the LU factorizationof A.

Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.trans

INTEGER. The order of A; the number of rows in B; n ≥ 0.n




REAL for sgbtrsab, b

328


DOUBLE PRECISION for dgbtrsCOMPLEX for cgbtrsDOUBLE COMPLEX for zgbtrs.Arrays: ab(ldab,*), b(ldb,*).The array ab contains the matrix A in band storage (see Matrix StorageSchemes).The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of ab must be at least max(1, n), and thesecond dimension of b at least max(1,nrhs).

INTEGER. The leading dimension of the array ab; ldab ≥ 2*kl + ku +1.ldab

INTEGER. The leading dimension of b; ldb ≥ max(1, n).ldb

INTEGER. Array, DIMENSION at least max(1, n). The ipiv array, asreturned by ?gbtrf.

ipiv

Output Parameters


INTEGER. If info=0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.



Specific details for the routine gbtrs interface are as follows:


a


Holds the vector of length min(m, n).ipiv


Restored as lda-2*kl-1.ku


329


Application Notes


|E| ≤ c(kl + ku + 1)ε P|L||U|






The approximate number of floating-point operations for one right-hand side vector is 2n(ku+ 2kl) for real flavors. The number of operations for complex flavors is 4 times greater. Allthese estimates assume that kl and ku are much less than min(m,n).

To estimate the condition number κ∞(A), call ?gbcon.

To refine the solution and estimate the error, call ?gbrfs.

330


?gttrsSolves a system of linear equations with atridiagonal matrix using the LU factorizationcomputed by ?gttrf.

Syntax

Fortran 77:

call sgttrs( trans, n, nrhs, dl, d, du, du2, ipiv, b, ldb, info )

call dgttrs( trans, n, nrhs, dl, d, du, du2, ipiv, b, ldb, info )

call cgttrs( trans, n, nrhs, dl, d, du, du2, ipiv, b, ldb, info )

call zgttrs( trans, n, nrhs, dl, d, du, du2, ipiv, b, ldb, info )

Fortran 95:

call gttrs( dl, d, du, du2, b, ipiv [, trans] [,info] )

Description

This routine solves for X the following systems of linear equations with multiple right handsides:




Before calling this routine, you must call ?gttrf to compute the LU factorization of A.

Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then A*X = B is solved for X.If trans = 'T', then AT*X = B is solved for X.If trans = 'C', then AH*X = B is solved for X.

INTEGER. The order of A; n ≥ 0.n

INTEGER. The number of right-hand sides, that is, the number of

columns in B; nrhs ≥ 0.

nrhs

331


REAL for sgttrsdl,d,du,du2,bDOUBLE PRECISION for dgttrsCOMPLEX for cgttrsDOUBLE COMPLEX for zgttrs.Arrays: dl(n -1), d(n), du(n -1), du2(n -2), b(ldb,nrhs).The array dl contains the (n - 1) multipliers that define the matrixL from the LU factorization of A.The array d contains the n diagonal elements of the upper triangularmatrix U from the LU factorization of A.The array du contains the (n - 1) elements of the first superdiagonalof U.The array du2 contains the (n - 2) elements of the second superdiagonalof U.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.


INTEGER. Array, DIMENSION (n). The ipiv array, as returned by?gttrf.

ipiv

Output Parameters





Specific details for the routine gttrs interface are as follows:






332




Application Notes


|E| ≤ c(n)ε P|L||U|






The approximate number of floating-point operations for one right-hand side vector b is 7n(including n divisions) for real flavors and 34n (including 2n divisions) for complex flavors.

To estimate the condition number κ∞(A), call ?gtcon.

To refine the solution and estimate the error, call ?gtrfs.

333


?potrsSolves a system of linear equations with aCholesky-factored symmetric (Hermitian)positive-definite matrix.

Syntax

Fortran 77:

call spotrs( uplo, n, nrhs, a, lda, b, ldb, info )

call dpotrs( uplo, n, nrhs, a, lda, b, ldb, info )

call cpotrs( uplo, n, nrhs, a, lda, b, ldb, info )

call zpotrs( uplo, n, nrhs, a, lda, b, ldb, info )

Fortran 95:

call potrs( a, b [,uplo] [, info] )

Description

This routine solves for X the system of linear equations A*X = B with a symmetricpositive-definite or, for complex data, Hermitian positive-definite matrix A, given the Choleskyfactorization of A:

If uplo = 'U'A = UT*U for realdata, A = UH*U forcomplex data

If uplo = 'L',A = L*LT for realdata, A = L*LH forcomplex data

where L is a lower triangular matrix and U is upper triangular. The system is solved with multipleright-hand sides stored in the columns of the matrix B.

Before calling this routine, you must call ?potrf to compute the Cholesky factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the upper triangle of A is stored.

334


If uplo = 'L', the lower triangle of A is stored.


INTEGER. The number of right-hand sides (nrhs ≥ 0).nrhs

REAL for spotrsa, bDOUBLE PRECISION for dpotrsCOMPLEX for cpotrsDOUBLE COMPLEX for zpotrs.Arrays: a(lda,*), b(ldb,*).The array a contains the factor U or L (see uplo).The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n), the seconddimension of b at least max(1,nrhs).



Output Parameters





Specific details for the routine potrs interface are as follows:




335


Application Notes

If uplo = 'U', the computed solution for each right-hand side b is the exact solution of aperturbed system of equations (A + E)x = b, where

|E| ≤ c(n)ε |UH||U|


A similar estimate holds for uplo = 'L'. If x0 is the true solution, the computed solution x

satisfies this error bound:


Note that cond(A,x) can be much smaller than κ∞ (A). The approximate number of floating-pointoperations for one right-hand side vector b is 2n2 for real flavors and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?pocon.

To refine the solution and estimate the error, call ?porfs.

?pptrsSolves a system of linear equations with a packedCholesky-factored symmetric (Hermitian)positive-definite matrix.

Syntax

Fortran 77:

call spptrs( uplo, n, nrhs, ap, b, ldb, info )

call dpptrs( uplo, n, nrhs, ap, b, ldb, info )

call cpptrs( uplo, n, nrhs, ap, b, ldb, info )

call zpptrs( uplo, n, nrhs, ap, b, ldb, info )

336


Fortran 95:

call pptrs( a, b [,uplo] [,info] )

Description

This routine solves for X the system of linear equations A*X = B with a packed symmetricpositive-definite or, for complex data, Hermitian positive-definite matrix A, given the Choleskyfactorization of A:

If uplo = 'U'A = UT*U for realdata, A = UH*U forcomplex data

If uplo = 'L',A = L*LT for realdata, A = L*LH forcomplex data


Before calling this routine, you must call ?pptrf to compute the Cholesky factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the upper triangle of A is stored.If uplo = 'L', the lower triangle of A is stored.



REAL for spptrsap, bDOUBLE PRECISION for dpptrsCOMPLEX for cpptrsDOUBLE COMPLEX for zpptrs.Arrays: ap(*), b(ldb,*)The dimension of ap must be at least max(1,n(n+1)/2).The array ap contains the factor U or L, as specified by uplo, in packedstorage (see Matrix Storage Schemes).

337


The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).


Output Parameters





Specific details for the routine pptrs interface are as follows:


a



Application Notes

If uplo = 'U', the computed solution for each right-hand side b is the exact solution of aperturbed system of equations (A + E)x = b, where

|E| ≤ c(n)ε |UH||U|




338



Note that cond(A,x) can be much smaller than κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 2n2

for real flavors and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?ppcon.

To refine the solution and estimate the error, call ?pprfs.

?pbtrsSolves a system of linear equations with aCholesky-factored symmetric (Hermitian)positive-definite band matrix.

Syntax

Fortran 77:

call spbtrs( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call dpbtrs( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call cpbtrs( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call zpbtrs( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

Fortran 95:

call pbtrs( a, b [,uplo] [,info] )

Description

This routine solves for real data a system of linear equations A*X = B with a symmetricpositive-definite or, for complex data, Hermitian positive-definite band matrix A, given theCholesky factorization of A:

If uplo='U'A = UT*U for realdata, A = UH*U forcomplex data

339


If uplo='L',A = L*LT for realdata, A = L*LH forcomplex data


Before calling this routine, you must call ?pbtrf to compute the Cholesky factorization of A inthe band storage form.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the upper triangular factor is stored in ab.If uplo = 'L', the lower triangular factor is stored in ab.



A; kd ≥ 0.

kd


REAL for spbtrsab, bDOUBLE PRECISION for dpbtrsCOMPLEX for cpbtrsDOUBLE COMPLEX for zpbtrs.Arrays: ab(ldab,*), b(ldb,*).The array ab contains the Cholesky factor, as returned by thefactorization routine, in band storage form.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of ab must be at least max(1, n), and thesecond dimension of b at least max(1,nrhs).

INTEGER. The first dimension of the array ab; ldab ≥ kd +1.ldab


Output Parameters


340





Specific details for the routine pbtrs interface are as follows:


a



Application Notes


|E| ≤ c(kd + 1)ε P|UH||U| or |E| ≤ c(kd + 1)ε P|LH||L|





The approximate number of floating-point operations for one right-hand side vector is 4n*kdfor real flavors and 16n*kd for complex flavors.

To estimate the condition number κ∞(A), call ?pbcon.

To refine the solution and estimate the error, call ?pbrfs.

341


?pttrsSolves a system of linear equations with asymmetric (Hermitian) positive-definite tridiagonalmatrix using the factorization computed by ?pttrf.

Syntax

Fortran 77:

call spttrs( n, nrhs, d, e, b, ldb, info )

call dpttrs( n, nrhs, d, e, b, ldb, info )

call cpttrs( uplo, n, nrhs, d, e, b, ldb, info )

call zpttrs( uplo, n, nrhs, d, e, b, ldb, info )

Fortran 95:

call pttrs( d, e, b [,info] )

call pttrs( d, e, b [,uplo] [,info] )

Description

This routine solves for X a system of linear equations A*X = B with a symmetric (Hermitian)positive-definite tridiagonal matrix A. Before calling this routine, call ?pttrf to compute theL*D*L' for real data and the L*D*L' or U'*D*U factorization of A for complex data.

Input Parameters

CHARACTER*1. Used for cpttrs/zpttrs only. Must be 'U' or 'L'.uploSpecifies whether the superdiagonal or the subdiagonal of the tridiagonalmatrix A is stored and how A is factored:If uplo = 'U', the array e stores the superdiagonal of A, and A isfactored as U'*D*U.If uplo = 'L', the array e stores the subdiagonal of A, and A is factoredas L*D*L'.

INTEGER. The order of A; n ≥ 0.n


columns of the matrix B; nrhs ≥ 0.

nrhs

REAL for spttrs, cpttrsd

342


DOUBLE PRECISION for dpttrs, zpttrs.Array, dimension (n). Contains the diagonal elements of the diagonalmatrix D from the factorization computed by ?pttrf.

REAL for spttrse, bDOUBLE PRECISION for dpttrsCOMPLEX for cpttrsDOUBLE COMPLEX for zpttrs.Arrays: e(n -1), b(ldb, nrhs).The array e contains the (n - 1) off-diagonal elements of the unitbidiagonal factor U or L from the factorization computed by ?pttrf(see uplo).The array b contains the matrix B whose columns are the right-handsides for the systems of equations.


Output Parameters





Specific details for the routine pttrs interface are as follows:




Used in complex flavors only. Must be 'U' or 'L'. The default value is'U'.

uplo

343


?sytrsSolves a system of linear equations with a UDU-or LDL-factored symmetric matrix.

Syntax

Fortran 77:

call ssytrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

call dsytrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

call csytrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

call zsytrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

Fortran 95:

call sytrs( a, b, ipiv [,uplo] [,info] )

Description

This routine solves for X the system of linear equations A*X = B with a symmetric matrix A,given the Bunch-Kaufman factorization of A:

A = P*U*D*UT*PTif uplo='U',

A = P*L*D*LT*PT,if uplo='L',

where P is a permutation matrix, U and L are upper and lower triangular matrices with unitdiagonal, and D is a symmetric block-diagonal matrix. The system is solved with multipleright-hand sides stored in the columns of the matrix B. You must supply to this routine thefactor U (or L) and the array ipiv returned by the factorization routine ?sytrf.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array a stores the upper triangular factor U of thefactorization A = P*U*D*UT*PT.If uplo = 'L', the array a stores the lower triangular factor L of thefactorization A = P*L*D*LT*PT.


344



INTEGER. Array, DIMENSION at least max(1, n). The ipiv array, asreturned by ?sytrf.

ipiv

REAL for ssytrsa, bDOUBLE PRECISION for dsytrsCOMPLEX for csytrsDOUBLE COMPLEX for zsytrs.Arrays: a(lda,*), b(ldb,*).The array a contains the factor U or L (see uplo).The array b contains the matrix B whose columns are the right-handsides for the system of equations.The second dimension of a must be at least max(1,n), and the seconddimension of b at least max(1,nrhs).



Output Parameters





Specific details for the routine sytrs interface are as follows:





345


Application Notes


|E| ≤ c(n)ε P|U||D||UT|PT or |E| ≤ c(n)ε P|L||D||UT|PT





The total number of floating-point operations for one right-hand side vector is approximately2n2 for real flavors or 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?sycon.

To refine the solution and estimate the error, call ?syrfs.

?hetrsSolves a system of linear equations with a UDU-or LDL-factored Hermitian matrix.

Syntax

Fortran 77:

call chetrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

call zhetrs( uplo, n, nrhs, a, lda, ipiv, b, ldb, info )

Fortran 95:

call hetrs( a, b, ipiv [, uplo] [,info] )

346


Description

This routine solves for X the system of linear equations A*X = B with a Hermitian matrix A,given the Bunch-Kaufman factorization of A:

A = P*U*D*UH*PTif uplo = 'U'

A = P*L*D*LH*PT,if uplo = 'L'

where P is a permutation matrix, U and L are upper and lower triangular matrices with unitdiagonal, and D is a symmetric block-diagonal matrix. The system is solved with multipleright-hand sides stored in the columns of the matrix B. You must supply to this routine thefactor U (or L) and the array ipiv returned by the factorization routine ?hetrf.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array a stores the upper triangular factor U of thefactorization A = P*U*D*UH*PT.If uplo = 'L', the array a stores the lower triangular factor L of thefactorization A = P*L*D*LH*PT.



INTEGER.ipivArray, DIMENSION at least max(1, n).The ipiv array, as returned by ?hetrf.

COMPLEX for chetrsa, bDOUBLE COMPLEX for zhetrs.Arrays: a(lda,*), b(ldb,*).The array a contains the factor U or L (see uplo).The array b contains the matrix B whose columns are the right-handsides for the system of equations.The second dimension of a must be at least max(1,n), the seconddimension of b at least max(1,nrhs).



347


Output Parameters





Specific details for the routine hetrs interface are as follows:





Application Notes


|E| ≤ c(n)ε P|U||D||UH|PT or |E| ≤ c(n)ε P|L||D||LH|PT





The total number of floating-point operations for one right-hand side vector is approximately8n2.

348


To estimate the condition number κ∞(A), call ?hecon.

To refine the solution and estimate the error, call ?herfs.

?sptrsSolves a system of linear equations with a UDU-or LDL-factored symmetric matrix using packedstorage.

Syntax

Fortran 77:

call ssptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call dsptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call csptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call zsptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

Fortran 95:

call sptrs( a, b, ipiv [, uplo] [,info] )

Description

This routine solves for X the system of linear equations A*X = B with a symmetric matrix A,given the Bunch-Kaufman factorization of A:

A = PUDUTPTif uplo='U',

A = PLDLTPT,if uplo='L',

where P is a permutation matrix, U and L are upper and lower packed triangular matrices withunit diagonal, and D is a symmetric block-diagonal matrix. The system is solved with multipleright-hand sides stored in the columns of the matrix B. You must supply the factor U (or L) andthe array ipiv returned by the factorization routine ?sptrf.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:

349


If uplo = 'U', the array ap stores the packed factor U of thefactorization A = P*U*D*UT*PT. If uplo = 'L', the array ap stores thepacked factor L of the factorization A = P*L*D*LT*PT.



INTEGER.ipivArray, DIMENSION at least max(1, n). The ipiv array, as returned by?sptrf.

REAL for ssptrsap, bDOUBLE PRECISION for dsptrsCOMPLEX for csptrsDOUBLE COMPLEX for zsptrs.Arrays: ap(*), b(ldb,*).The dimension of ap must be at least max(1,n(n+1)/2). The array apcontains the factor U or L, as specified by uplo, in packed storage(see Matrix Storage Schemes).The array b contains the matrix B whose columns are the right-handsides for the system of equations. The second dimension of b must beat least max(1, nrhs).


Output Parameters





Specific details for the routine sptrs interface are as follows:


a


350




Application Notes


|E| ≤ c(n)ε P|U||D||UT|PT or |E| ≤ c(n)ε P|L||D||LT|PT





The total number of floating-point operations for one right-hand side vector is approximately2n2 for real flavors or 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?spcon.

To refine the solution and estimate the error, call ?sprfs.

351


?hptrsSolves a system of linear equations with a UDU-or LDL-factored Hermitian matrix using packedstorage.

Syntax

Fortran 77:

call chptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call zhptrs( uplo, n, nrhs, ap, ipiv, b, ldb, info )

Fortran 95:

call hptrs( a, b, ipiv [,uplo] [,info] )

Description

This routine solves for X the system of linear equations A*X = B with a Hermitian matrix A,given the Bunch-Kaufman factorization of A:

A = P*U*D*UH*PTif uplo='U',

A = P*L*D*LH*PT,if uplo='L',

where P is a permutation matrix, U and L are upper and lower packed triangular matrices withunit diagonal, and D is a symmetric block-diagonal matrix. The system is solved with multipleright-hand sides stored in the columns of the matrix B.

You must supply to this routine the arrays ap (containing U or L)and ipiv in the form returnedby the factorization routine ?hptrf.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array ap stores the packed factor U of thefactorization A = P*U*D*UH*PT. If uplo = 'L', the array ap stores thepacked factor L of the factorization A = P*L*D*LH*PT.



352


INTEGER. Array, DIMENSION at least max(1, n). The ipiv array, asreturned by ?hptrf.

ipiv

COMPLEX for chptrsap, bDOUBLE COMPLEX for zhptrs.Arrays: ap(*), b(ldb,*).The dimension of ap must be at least max(1,n(n+1)/2). The array apcontains the factor U or L, as specified by uplo, in packed storage(see Matrix Storage Schemes).The array b contains the matrix B whose columns are the right-handsides for the system of equations. The second dimension of b must beat least max(1, nrhs).


Output Parameters





Specific details for the routine hptrs interface are as follows:


a




Application Notes


|E| ≤ c(n)ε P|U||D||UH|PT or |E| ≤ c(n)ε P|L||D||LH|PT

353






The total number of floating-point operations for one right-hand side vector is approximately8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?hpcon.

To refine the solution and estimate the error, call ?hprfs.

?trtrsSolves a system of linear equations with atriangular matrix, with multiple right-hand sides.

Syntax

Fortran 77:

call strtrs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, info )

call dtrtrs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, info )

call ctrtrs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, info )

call ztrtrs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, info )

Fortran 95:

call trtrs( a, b [,uplo] [, trans] [,diag] [,info] )

354


Description

This routine solves for X the following systems of linear equations with a triangular matrix A,with multiple right-hand sides stored in B:




Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether A is upper or lower triangular:If uplo = 'U', then A is upper triangular.If uplo = 'L', then A is lower triangular.

CHARACTER*1. Must be 'N' or 'T' or 'C'.transIf trans = 'N', then A*X = B is solved for X.If trans = 'T', then AT*X = B is solved for X.If trans = 'C', then AH*X = B is solved for X.

CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements of A areassumed to be 1 and not referenced in the array a.



REAL for strtrsa, bDOUBLE PRECISION for dtrtrsCOMPLEX for ctrtrsDOUBLE COMPLEX for ztrtrs.Arrays: a(lda,*), b(ldb,*).The array a contains the matrix A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n), the seconddimension of b at least max(1,nrhs).


355



Output Parameters





Specific details for the routine trtrs interface are as follows:

Stands for argument ap in Fortan 77 interface. Holds the matrix A ofsize (n*(n+1)/2).

a





Application Notes


|E| ≤ c(n)ε |A|

c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution,the computed solution x satisfies this error bound:


356




The approximate number of floating-point operations for one right-hand side vector b is n2 forreal flavors and 4n2 for complex flavors.

To estimate the condition number κ∞(A), call ?trcon.

To estimate the error in the solution, call ?trrfs.

?tptrsSolves a system of linear equations with a packedtriangular matrix, with multiple right-hand sides.

Syntax

Fortran 77:

call stptrs( uplo, trans, diag, n, nrhs, ap, b, ldb, info )

call dtptrs( uplo, trans, diag, n, nrhs, ap, b, ldb, info )

call ctptrs( uplo, trans, diag, n, nrhs, ap, b, ldb, info )

call ztptrs( uplo, trans, diag, n, nrhs, ap, b, ldb, info )

Fortran 95:

call tptrs( a, b [,uplo] [, trans] [,diag] [,info] )

Description

This routine solves for X the following systems of linear equations with a packed triangularmatrix A, with multiple right-hand sides stored in B:




Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether A is upper or lower triangular:

357


If uplo = 'U', then A is upper triangular.If uplo = 'L', then A is lower triangular.


CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements are assumedto be 1 and not referenced in the array ap.



REAL for stptrsap, bDOUBLE PRECISION for dtptrsCOMPLEX for ctptrsDOUBLE COMPLEX for ztptrs.Arrays: ap(*), b(ldb,*).The dimension of ap must be at least max(1,n(n+1)/2). The array apcontains the matrix A in packed storage (see Matrix StorageSchemes).The array b contains the matrix B whose columns are the right-handsides for the system of equations. The second dimension of b must beat least max(1, nrhs).


Output Parameters





Specific details for the routine tptrs interface are as follows:

358



a





Application Notes


|E| ≤ c(n)ε |A|






The approximate number of floating-point operations for one right-hand side vector b is n2 forreal flavors and 4n2 for complex flavors.

To estimate the condition number κ∞(A), call ?tpcon.

To estimate the error in the solution, call ?tprfs.

359


?tbtrsSolves a system of linear equations with a bandtriangular matrix, with multiple right-hand sides.

Syntax

Fortran 77:

call stbtrs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, info )

call dtbtrs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, info )

call ctbtrs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, info )

call ztbtrs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, info )

Fortran 95:

call tbtrs( a, b [,uplo] [, trans] [,diag] [,info] )

Description

This routine solves for X the following systems of linear equations with a band triangular matrixA, with multiple right-hand sides stored in B:




Input Parameters



CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.

360


If diag = 'U', then A is unit triangular: diagonal elements are assumedto be 1 and not referenced in the array ab.



A; kd ≥ 0.

kd


REAL for stbtrsab, bDOUBLE PRECISION for dtbtrsCOMPLEX for ctbtrsDOUBLE COMPLEX for ztbtrs.Arrays: ab(ldab,*), b(ldb,*).The array ab contains the matrix A in band storage form.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The second dimension of ab must be at least max(1, n), the seconddimension of b at least max(1,nrhs).

INTEGER. The first dimension of ab; ldab ≥ kd + 1.ldab


Output Parameters





Specific details for the routine tbtrs interface are as follows:

Stands for the argument ab in Fortan 77 interface. Holds the array Aof size (kd+1,n)

a



361




Application Notes


|E|≤ c(n)ε|A|

c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution,the computed solution x satisfies this error bound:




The approximate number of floating-point operations for one right-hand side vector b is 2n*kdfor real flavors and 8n*kd for complex flavors.

To estimate the condition number κ∞(A), call ?tbcon.

To estimate the error in the solution, call ?tbrfs.

Routines for Estimating the Condition Number

This section describes the LAPACK routines for estimating the condition number of a matrix.The condition number is used for analyzing the errors in the solution of a system of linearequations (see Error Analysis). Since the condition number may be arbitrarily large when thematrix is nearly singular, the routines actually compute the reciprocal condition number.

362


?geconEstimates the reciprocal of the condition numberof a general matrix in the 1-norm or theinfinity-norm.

Syntax

Fortran 77:

call sgecon( norm, n, a, lda, anorm, rcond, work, iwork, info )

call dgecon( norm, n, a, lda, anorm, rcond, work, iwork, info )

call cgecon( norm, n, a, lda, anorm, rcond, work, rwork, info )

call zgecon( norm, n, a, lda, anorm, rcond, work, rwork, info )

Fortran 95:

call gecon( a, anorm, rcond [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a general matrix A in the 1-normor infinity-norm:

κ1(A) =||A||1||A-1||1 = κ∞(A

T)= κ∞(AH)

κ∞(A) =||A||∞||A-1||∞ = κ1(AT)= κ1(AH).

Before calling this routine:

• compute anorm (either ||A||1 =maxj Σi |aij| or ||A||∞ = maxi Σj |aij|)

• call ?getrf to compute the LU factorization of A.

Input Parameters

CHARACTER*1. Must be '1' or 'O' or 'I'.normIf norm = '1' or 'O', then the routine estimates the condition numberof matrix A in 1-norm.If norm = 'I', then the routine estimates the condition number ofmatrix A in infinity-norm.

363



REAL for sgecona, workDOUBLE PRECISION for dgeconCOMPLEX for cgeconDOUBLE COMPLEX for zgecon. Arrays: a(lda,*), work(*).The array a contains the LU-factored matrix A, as returned by ?getrf.The second dimension of a must be at least max(1,n). The array workis a workspace for the routine.The dimension of work must be at least max(1, 4*n) for real flavorsand max(1, 2*n) for complex flavors.

REAL for single precision flavors.anormDOUBLE PRECISION for double precision flavors. The norm of theoriginal matrix A (see Description).


INTEGER. Workspace array, DIMENSION at least max(1, n).iwork

REAL for cgeconrworkDOUBLE PRECISION for zgecon.Workspace array, DIMENSION at least max(1, 2*n).

Output Parameters

REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal of the condition number. The routine setsrcond = 0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine gecon interface are as follows:

364



Must be '1', 'O', or 'I'. The default value is '1'.norm

Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b or AH*x = b; the number is usually 4 or 5 and nevermore than 11. Each solution requires approximately 2*n2 floating-point operations for realflavors and 8*n2 for complex flavors.

?gbconEstimates the reciprocal of the condition numberof a band matrix in the 1-norm or the infinity-norm.

Syntax

Fortran 77:

call sgbcon( norm, n, kl, ku, ab, ldab, ipiv, anorm, rcond, work, iwork, info)

call dgbcon( norm, n, kl, ku, ab, ldab, ipiv, anorm, rcond, work, iwork, info)

call cgbcon( norm, n, kl, ku, ab, ldab, ipiv, anorm, rcond, work, rwork, info)

call zgbcon( norm, n, kl, ku, ab, ldab, ipiv, anorm, rcond, work, rwork, info)

Fortran 95:

call gbcon( a, ipiv, anorm, rcond [,kl] [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a general band matrix A in the1-norm or infinity-norm:

κ1(A) =||A||1||A-1||1 = κ∞(A

T) = κ∞(AH)

κ∞(A) =||A||∞||A-1||∞ = κ1(AT) = κ1(AH) .

365



• compute anorm (either ||A||1 =maxj Σi |aij| or ||A||∞ = maxi Σj |aij|)

• call ?gbtrf to compute the LU factorization of A.

Input Parameters





INTEGER. The first dimension of the array ab. (ldab ≥ 2*kl + ku +1).ldab

INTEGER. Array, DIMENSION at least max(1, n). The ipiv array, asreturned by ?gbtrf.

ipiv

REAL for sgbconab, workDOUBLE PRECISION for dgbconCOMPLEX for cgbconDOUBLE COMPLEX for zgbcon.Arrays: ab(ldab,*), work(*).The array ab contains the factored band matrix A, as returned by?gbtrf.The second dimension of ab must be at least max(1,n). The array workis a workspace for the routine.The dimension of work must be at least max(1, 3*n) for real flavorsand max(1, 2*n) for complex flavors.

REAL for single precision flavors.anormDOUBLE PRECISION for double precision flavors.The norm of the original matrix A (see Description).


REAL for cgbconrworkDOUBLE PRECISION for zgbcon.Workspace array, DIMENSION at least max(1, 2*n).

366


Output Parameters

REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal of the condition number. The routine setsrcond =0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine gbcon interface are as follows:


a





Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b or AH*x = b; the number is usually 4 or 5 and nevermore than 11. Each solution requires approximately 2n(ku + 2kl) floating-point operationsfor real flavors and 8n(ku + 2kl) for complex flavors.

367


?gtconEstimates the reciprocal of the condition numberof a tridiagonal matrix using the factorizationcomputed by ?gttrf.

Syntax

Fortran 77:

call sgtcon( norm, n, dl, d, du, du2, ipiv, anorm, rcond, work, iwork, info)

call dgtcon( norm, n, dl, d, du, du2, ipiv, anorm, rcond, work, iwork, info)

call cgtcon( norm, n, dl, d, du, du2, ipiv, anorm, rcond, work, info )

call zgtcon( norm, n, dl, d, du, du2, ipiv, anorm, rcond, work, info )

Fortran 95:

call gtcon( dl, d, du, du2, ipiv, anorm, rcond [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a real or complex tridiagonalmatrix A in the 1-norm or infinity-norm:

κ1(A) = ||A||1||A-1||1

κ∞(A) = ||A||∞||A-1||∞

An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computedas rcond = 1 / (||A|| ||A-1||).


• compute anorm (either ||A||1 = maxj Σi |aij| or ||A||∞ = maxi Σj |aij|)

• call ?gttrf to compute the LU factorization of A.

Input Parameters

CHARACTER*1. Must be '1' or 'O' or 'I'.norm

368


If norm = '1' or 'O', then the routine estimates the condition numberof matrix A in 1-norm.If norm = 'I', then the routine estimates the condition number ofmatrix A in infinity-norm.


REAL for sgtcondl,d,du,du2DOUBLE PRECISION for dgtconCOMPLEX for cgtconDOUBLE COMPLEX for zgtcon.Arrays: dl(n -1), d(n), du(n -1), du2(n -2).The array dl contains the (n - 1) multipliers that define the matrixL from the LU factorization of A as computed by ?gttrf.The array d contains the n diagonal elements of the upper triangularmatrix U from the LU factorization of A.The array du contains the (n - 1) elements of the first superdiagonalof U.The array du2 contains the (n - 2) elements of the second superdiagonalof U.

INTEGER.ipivArray, DIMENSION (n). The array of pivot indices, as returned by?gttrf.


REAL for sgtconworkDOUBLE PRECISION for dgtconCOMPLEX for cgtconDOUBLE COMPLEX for zgtcon.Workspace array, DIMENSION (2*n).

INTEGER. Workspace array, DIMENSION (n). Used for real flavors only.iwork

Output Parameters

REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.

369


An estimate of the reciprocal of the condition number. The routine setsrcond=0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine gtcon interface are as follows:







Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately 2n2 floating-point operations for real flavors and 8n2 forcomplex flavors.

370


?poconEstimates the reciprocal of the condition numberof a symmetric (Hermitian) positive-definite matrix.

Syntax

Fortran 77:

call spocon( uplo, n, a, lda, anorm, rcond, work, iwork, info )

call dpocon( uplo, n, a, lda, anorm, rcond, work, iwork, info )

call cpocon( uplo, n, a, lda, anorm, rcond, work, rwork, info )

call zpocon( uplo, n, a, lda, anorm, rcond, work, rwork, info )

Fortran 95:

call pocon( a, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a symmetric (Hermitian)positive-definite matrix A:

κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).



• call ?potrf to compute the Cholesky factorization of A.

Input Parameters



REAL for spocona, workDOUBLE PRECISION for dpoconCOMPLEX for cpocon

371


DOUBLE COMPLEX for zpocon.Arrays: a(lda,*), work(*).The array a contains the factored matrix A, as returned by ?potrf. Thesecond dimension of a must be at least max(1,n).The array work is a workspace for the routine. The dimension of workmust be at least max(1, 3*n) for real flavors and max(1, 2*n) forcomplex flavors.


REAL for single precision flavorsanormDOUBLE PRECISION for double precision flavors.The norm of the original matrix A (see Description).


REAL for cpoconrworkDOUBLE PRECISION for zpocon.Workspace array, DIMENSION at least max(1, n).

Output Parameters

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal of the condition number. The routine setsrcond =0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine pocon interface are as follows:



372


Application Notes


?ppconEstimates the reciprocal of the condition numberof a packed symmetric (Hermitian) positive-definitematrix.

Syntax

Fortran 77:

call sppcon( uplo, n, ap, anorm, rcond, work, iwork, info )

call dppcon( uplo, n, ap, anorm, rcond, work, iwork, info )

call cppcon( uplo, n, ap, anorm, rcond, work, rwork, info )

call zppcon( uplo, n, ap, anorm, rcond, work, rwork, info )

Fortran 95:

call ppcon( a, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a packed symmetric (Hermitian)positive-definite matrix A:




• call ?pptrf to compute the Cholesky factorization of A.

373


Input Parameters



REAL for sppconap, workDOUBLE PRECISION for dppconCOMPLEX for cppconDOUBLE COMPLEX for zppcon.Arrays: ap(*), work(*).The array ap contains the packed factored matrix A, as returned by?pptrf. The dimension of ap must be at least max(1,n(n+1)/2).The array work is a workspace for the routine. The dimension of workmust be at least max(1, 3*n) for real flavors and max(1, 2*n) forcomplex flavors.



REAL for cppconrworkDOUBLE PRECISION for zppcon.Workspace array, DIMENSION at least max(1, n).

Output Parameters



374




Specific details for the routine ppcon interface are as follows:

Stands for the argument ap in Fortan 77 interface. Holds the array Aof size (n*(n+1)/2).

a


Application Notes


?pbconEstimates the reciprocal of the condition numberof a symmetric (Hermitian) positive-definite bandmatrix.

Syntax

Fortran 77:

call spbcon( uplo, n, kd, ab, ldab, anorm, rcond, work, iwork, info )

call dpbcon( uplo, n, kd, ab, ldab, anorm, rcond, work, iwork, info )

call cpbcon( uplo, n, kd, ab, ldab, anorm, rcond, work, rwork, info )

call zpbcon( uplo, n, kd, ab, ldab, anorm, rcond, work, rwork, info )

Fortran 95:

call pbcon( a, anorm, rcond [,uplo] [,info] )

375


Description

This routine estimates the reciprocal of the condition number of a symmetric (Hermitian)positive-definite band matrix A:




• call ?pbtrf to compute the Cholesky factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the upper triangular factor is stored in ab.If uplo = 'L', the lower triangular factor is stored in ab.



A; kd ≥ 0.

kd

INTEGER. The first dimension of the array ab. (ldab ≥ kd +1).ldab

REAL for spbconab, workDOUBLE PRECISION for dpbconCOMPLEX for cpbconDOUBLE COMPLEX for zpbcon.Arrays: ab(ldab,*), work(*).The array ab contains the factored matrix A in band form, as returnedby ?pbtrf. The second dimension of ab must be at least max(1, n).The array work is a workspace for the routine. The dimension of workmust be at least max(1, 3*n) for real flavors and max(1, 2*n) forcomplex flavors.



REAL for cpbconrwork

376


DOUBLE PRECISION for zpbcon.Workspace array, DIMENSION at least max(1, n).

Output Parameters





Specific details for the routine pbcon interface are as follows:


a


Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately 4*n(kd + 1) floating-point operations for real flavorsand 16*n(kd + 1) for complex flavors.

377


?ptconEstimates the reciprocal of the condition numberof a symmetric (Hermitian) positive-definitetridiagonal matrix.

Syntax

Fortran 77:

call sptcon( n, d, e, anorm, rcond, work, info )

call dptcon( n, d, e, anorm, rcond, work, info )

call cptcon( n, d, e, anorm, rcond, work, info )

call zptcon( n, d, e, anorm, rcond, work, info )

Fortran 95:

call ptcon( d, e, anorm, rcond [,info] )

Description

This routine computes the reciprocal of the condition number (in the 1-norm) of a real symmetricor complex Hermitian positive-definite tridiagonal matrix using the factorization A = L*D*LT

for real flavors and A = L*D*LH for complex flavors or A = UT*D*U for real flavors and A =UH*D*U for complex flavors computed by ?pttrf :


The norm ||A-1|| is computed by a direct method, and the reciprocal of the condition numberis computed as rcond = 1 / (||A|| ||A-1||).


• compute anorm as ||A||1 = maxj Σi |aij|

• call ?pttrf to compute the factorization of A.

Input Parameters


REAL for single precision flavorsd, workDOUBLE PRECISION for double precision flavors.

378


Arrays, dimension (n).The array d contains the n diagonal elements of the diagonal matrix Dfrom the factorization of A, as computed by ?pttrf ;work is a workspace array.

REAL for sptconeDOUBLE PRECISION for dptconCOMPLEX for cptconDOUBLE COMPLEX for zptcon.Array, DIMENSION (n -1).Contains off-diagonal elements of the unit bidiagonal factor U or L fromthe factorization computed by ?pttrf .

REAL for single precision flavors.anormDOUBLE PRECISION for double precision flavors.The 1- norm of the original matrix A (see Description).

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine gtcon interface are as follows:



379


Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately 4*n(kd + 1) floating-point operations for real flavorsand 16*n(kd + 1) for complex flavors.

?syconEstimates the reciprocal of the condition numberof a symmetric matrix.

Syntax

Fortran 77:

call ssycon( uplo, n, a, lda, ipiv, anorm, rcond, work, iwork, info )

call dsycon( uplo, n, a, lda, ipiv, anorm, rcond, work, iwork, info )

call csycon( uplo, n, a, lda, ipiv, anorm, rcond, work, info )

call zsycon( uplo, n, a, lda, ipiv, anorm, rcond, work, info )

Fortran 95:

call sycon( a, ipiv, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a symmetric matrix A:

κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).



• call ?sytrf to compute the factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uplo

380


Indicates how the input matrix A has been factored:If uplo = 'U', the array a stores the upper triangular factor U of thefactorization A = P*U*D*UT*PT.If uplo = 'L', the array a stores the lower triangular factor L of thefactorization A = P*L*D*LT*PT.


REAL for ssycona, workDOUBLE PRECISION for dsyconCOMPLEX for csyconDOUBLE COMPLEX for zsycon.Arrays: a(lda,*), work(*).The array a contains the factored matrix A, as returned by ?sytrf. Thesecond dimension of a must be at least max(1,n).The array work is a workspace for the routine.The dimension of work must be at least max(1, 2*n).


INTEGER. Array, DIMENSION at least max(1, n).ipivThe array ipiv, as returned by ?sytrf.



Output Parameters



381




Specific details for the routine sycon interface are as follows:




Application Notes


?heconEstimates the reciprocal of the condition numberof a Hermitian matrix.

Syntax

Fortran 77:

call checon( uplo, n, a, lda, ipiv, anorm, rcond, work, info )

call zhecon( uplo, n, a, lda, ipiv, anorm, rcond, work, info )

Fortran 95:

call hecon( a, ipiv, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a Hermitian matrix A:

κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = κ1(A)).


382


• compute anorm (either ||A||1 =maxj Σi |aij| or ||A||∞ =maxi Σj |aij|)

• call ?hetrf to compute the factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array a stores the upper triangular factor U of thefactorization A = P*U*D*UH*PT.If uplo = 'L', the array a stores the lower triangular factor L of thefactorization A = P*L*D*LH*PT.


COMPLEX for checona, workDOUBLE COMPLEX for zhecon.Arrays: a(lda,*), work(*).The array a contains the factored matrix A, as returned by ?hetrf. Thesecond dimension of a must be at least max(1,n).The array work is a workspace for the routine.The dimension of work must be at least max(1, 2*n).


INTEGER. Array, DIMENSION at least max(1, n).ipivThe array ipiv, as returned by ?hetrf.


Output Parameters



383




Specific details for the routine hecon interface are as follows:




Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 5 and never more than 11. Eachsolution requires approximately 8n2 floating-point operations.

?spconEstimates the reciprocal of the condition numberof a packed symmetric matrix.

Syntax

Fortran 77:

call sspcon( uplo, n, ap, ipiv, anorm, rcond, work, iwork, info )

call dspcon( uplo, n, ap, ipiv, anorm, rcond, work, iwork, info )

call cspcon( uplo, n, ap, ipiv, anorm, rcond, work, info )

call zspcon( uplo, n, ap, ipiv, anorm, rcond, work, info )

Fortran 95:

call spcon( a, ipiv, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a packed symmetric matrix A:

κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).

384




• call ?sptrf to compute the factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array ap stores the packed upper triangular factorU of the factorization A = P*U*D*UT*PT.If uplo = 'L', the array ap stores the packed lower triangular factorL of the factorization A = P*L*D*LT*PT.


REAL for sspconap, workDOUBLE PRECISION for dspconCOMPLEX for cspconDOUBLE COMPLEX for zspcon.Arrays: ap(*), work(*).The array ap contains the packed factored matrix A, as returned by?sptrf. The dimension of ap must be at least max(1,n(n+1)/2).The array work is a workspace for the routine.The dimension of work must be at least max(1, 2*n).

INTEGER. Array, DIMENSION at least max(1, n).ipivThe array ipiv, as returned by ?sptrf.



Output Parameters


385


An estimate of the reciprocal of the condition number. The routine setsrcond = 0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine spcon interface are as follows:


a



Application Notes


?hpconEstimates the reciprocal of the condition numberof a packed Hermitian matrix.

Syntax

Fortran 77:

call chpcon( uplo, n, ap, ipiv, anorm, rcond, work, info )

call zhpcon( uplo, n, ap, ipiv, anorm, rcond, work, info )

386


Fortran 95:

call hpcon( a, ipiv, anorm, rcond [,uplo] [,info] )

Description

This routine estimates the reciprocal of the condition number of a Hermitian matrix A:

κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = k1(A)).


• compute anorm (either ||A||1 =maxj Σi |aij| or ||A||∞ =maxi Σj |aij|)

• call ?hptrf to compute the factorization of A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array ap stores the packed upper triangular factorU of the factorization A = P*U*D*UT*PT.If uplo = 'L', the array ap stores the packed lower triangular factorL of the factorization A = P*L*D*LT*PT.


COMPLEX for chpconap, workDOUBLE COMPLEX for zhpcon.Arrays: ap(*), work(*).The array ap contains the packed factored matrix A, as returned by?hptrf. The dimension of ap must be at least max(1,n(n+1)/2).The array work is a workspace for the routine.The dimension of work must be at least max(1, 2*n).

INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv, as returned by?hptrf.


387


Output Parameters





Specific details for the routine hbcon interface are as follows:


a


Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 5 and never more than 11. Eachsolution requires approximately 8n2 floating-point operations.

388


?trconEstimates the reciprocal of the condition numberof a triangular matrix.

Syntax

Fortran 77:

call strcon( norm, uplo, diag, n, a, lda, rcond, work, iwork, info )

call dtrcon( norm, uplo, diag, n, a, lda, rcond, work, iwork, info )

call ctrcon( norm, uplo, diag, n, a, lda, rcond, work, rwork, info )

call ztrcon( norm, uplo, diag, n, a, lda, rcond, work, rwork, info )

Fortran 95:

call trcon( a, rcond [,uplo] [,diag] [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a triangular matrix A in eitherthe 1-norm or infinity-norm:

κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)

κ∞ (A) =||A||∞ ||A-1||∞ =k1 (AT) = κ1 (AH) .

Input Parameters


CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether A is upper or lower triangular:If uplo = 'U', the array a stores the upper triangle of A, other arrayelements are not referenced.If uplo = 'L', the array a stores the lower triangle of A, other arrayelements are not referenced.

CHARACTER*1. Must be 'N' or 'U'.diag

389


If diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements are assumedto be 1 and not referenced in the array a.


REAL for strcona, workDOUBLE PRECISION for dtrconCOMPLEX for ctrconDOUBLE COMPLEX for ztrcon.Arrays: a(lda,*), work(*).The array a contains the matrix A. The second dimension of a must beat least max(1,n).The array work is a workspace for the routine. The dimension of workmust be at least max(1, 3*n) for real flavors and max(1, 2*n) forcomplex flavors.



REAL for ctrconrworkDOUBLE PRECISION for ztrcon.Workspace array, DIMENSION at least max(1, n).

Output Parameters





390


Specific details for the routine trcon interface are as follows:





Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately n2 floating-point operations for real flavors and 4n2

operations for complex flavors.

?tpconEstimates the reciprocal of the condition numberof a packed triangular matrix.

Syntax

Fortran 77:

call stpcon( norm, uplo, diag, n, ap, rcond, work, iwork, info )

call dtpcon( norm, uplo, diag, n, ap, rcond, work, iwork, info )

call ctpcon( norm, uplo, diag, n, ap, rcond, work, rwork, info )

call ztpcon( norm, uplo, diag, n, ap, rcond, work, rwork, info )

Fortran 95:

call tpcon( a, rcond [,uplo] [,diag] [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a packed triangular matrix Ain either the 1-norm or infinity-norm:

κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)

κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .

391


Input Parameters


CHARACTER*1. Must be 'U' or 'L'. Indicates whether A is upper orlower triangular:

uplo

If uplo = 'U', the array ap stores the upper triangle of A in packedform.If uplo = 'L', the array ap stores the lower triangle of A in packedform.

CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements are assumedto be 1 and not referenced in the array ap.


REAL for stpconap, workDOUBLE PRECISION for dtpconCOMPLEX for ctpconDOUBLE COMPLEX for ztpcon.Arrays: ap(*), work(*).The array ap contains the packed matrix A. The dimension of ap mustbe at least max(1,n(n+1)/2). The array work is a workspace for theroutine.The dimension of work must be at least max(1, 3*n) for real flavorsand max(1, 2*n) for complex flavors.


REAL for ctpconrworkDOUBLE PRECISION for ztpcon.Workspace array, DIMENSION at least max(1, n).

Output Parameters


392


An estimate of the reciprocal of the condition number. The routine setsrcond =0 if the estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is small compared to1.0, for the working precision, the matrix may be poorly conditionedor even singular.




Specific details for the routine tpcon interface are as follows:


a




Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately n2 floating-point operations for real flavors and 4n2

operations for complex flavors.

393


?tbconEstimates the reciprocal of the condition numberof a triangular band matrix.

Syntax

Fortran 77:

call stbcon( norm, uplo, diag, n, kd, ab, ldab, rcond, work, iwork, info )

call dtbcon( norm, uplo, diag, n, kd, ab, ldab, rcond, work, iwork, info )

call ctbcon( norm, uplo, diag, n, kd, ab, ldab, rcond, work, rwork, info )

call ztbcon( norm, uplo, diag, n, kd, ab, ldab, rcond, work, rwork, info )

Fortran 95:

call tbcon( a, rcond [,uplo] [,diag] [,norm] [,info] )

Description

This routine estimates the reciprocal of the condition number of a triangular band matrix A ineither the 1-norm or infinity-norm:

κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)

κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .

Input Parameters


CHARACTER*1. Must be 'U' or 'L'. Indicates whether A is upper orlower triangular:

uplo

If uplo = 'U', the array ap stores the upper triangle of A in packedform.If uplo = 'L', the array ap stores the lower triangle of A in packedform.

CHARACTER*1. Must be 'N' or 'U'.diag

394


If diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements are assumedto be 1 and not referenced in the array ab.



A; kd ≥ 0.

kd

REAL for stbconab, workDOUBLE PRECISION for dtbconCOMPLEX for ctbconDOUBLE COMPLEX for ztbcon.Arrays: ab(ldab,*), work(*).The array ab contains the band matrix A. The second dimension of abmust be at least max(1,n). The array work is a workspace for theroutine.The dimension of work must be at least max(1, 3*n) for real flavorsand max(1, 2*n) for complex flavors.

INTEGER. The first dimension of the array ab. (ldab ≥ kd +1).ldab


REAL for ctbconrworkDOUBLE PRECISION for ztbcon.Workspace array, DIMENSION at least max(1, n).

Output Parameters



395




Specific details for the routine tbcon interface are as follows:


a




Application Notes

The computed rcond is never less than r (the reciprocal of the true condition number) and inpractice is nearly always less than 10r. A call to this routine involves solving a number ofsystems of linear equations A*x = b; the number is usually 4 or 5 and never more than 11.Each solution requires approximately 2*n(kd + 1) floating-point operations for real flavorsand 8*n(kd + 1) operations for complex flavors.

Refining the Solution and Estimating Its Error

This section describes the LAPACK routines for refining the computed solution of a system oflinear equations and estimating the solution error. You can call these routines after factorizingthe matrix of the system of equations and computing the solution (see Routines for MatrixFactorization and Routines for Solving Systems of Linear Equations).

396


?gerfsRefines the solution of a system of linear equationswith a general matrix and estimates its error.

Syntax

Fortran 77:

call sgerfs( trans, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call dgerfs( trans, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call cgerfs( trans, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

call zgerfs( trans, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

Fortran 95:

call gerfs( a, af, ipiv, b, x [,trans] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B or AT*X = B or AH*X = B with a general matrix A, with multiple right-hand sides. Foreach computed solution vector x, the routine computes the component-wise backward error

β. This error is the smallest relative perturbation in elements of A and b such that x is the exactsolution of the perturbed system:

|δaij|/|aij| ≤ β|aij|, |δbi|/|bi| ≤ β |bi| such that (A + δA)x = (b + δb).

Finally, the routine estimates the component-wise forward error in the computed solution

||x - xe||∞/||x||∞ (here xe is the exact solution).


• call the factorization routine ?getrf

• call the solver routine ?getrs.

397


Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', the system has the form A*X = B.If trans = 'T', the system has the form AT*X = B.If trans = 'C', the system has the form AH*X = B.



REAL for sgerfsa,af,b,x,workDOUBLE PRECISION for dgerfsCOMPLEX for cgerfsDOUBLE COMPLEX for zgerfs.Arrays:a(lda,*) contains the original matrix A, as supplied to ?getrf.af(ldaf,*) contains the factored matrix A, as returned by ?getrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a and af must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.


INTEGER. The first dimension of af; ldaf ≥ max(1, n).ldaf


INTEGER. The first dimension of x; ldx ≥ max(1, n).ldx

INTEGER.ipivArray, DIMENSION at least max(1, n).The ipiv array, as returned by ?getrf.

INTEGER.iworkWorkspace array, DIMENSION at least max(1, n).

REAL for cgerfsrworkDOUBLE PRECISION for zgerfs.

398


Workspace array, DIMENSION at least max(1, n).

Output Parameters

The refined solution matrix X.x

REAL for single precision flavorsferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




Specific details for the routine gerfs interface are as follows:


Holds the matrix AF of size (n, n).af



Holds the matrix X of size (n, nrhs).x

Holds the vector of length (nrhs).ferr

Holds the vector of length (nrhs).berr


Application Notes

The bounds returned in ferr are not rigorous, but in practice they almost always overestimatethe actual error.

For each right-hand side, computation of the backward error involves a minimum of 4n2

floating-point operations (for real flavors) or 16n2 operations (for complex flavors). In addition,each step of iterative refinement involves 6n2 operations (for real flavors) or 24n2 operations(for complex flavors); the number of iterations may range from 1 to 5. Estimating the forward

399


error involves solving a number of systems of linear equations A*x = b; the number is usually4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-pointoperations for real flavors or 8n2 for complex flavors.

?gbrfsRefines the solution of a system of linear equationswith a general band matrix and estimates its error.

Syntax

Fortran 77:

call sgbrfs( trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, b, ldb, x,ldx, ferr, berr, work, iwork, info )

call dgbrfs( trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, b, ldb, x,ldx, ferr, berr, work, iwork, info )

call cgbrfs( trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, b, ldb, x,ldx, ferr, berr, work, rwork, info )

call zgbrfs( trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, b, ldb, x,ldx, ferr, berr, work, rwork, info )

Fortran 95:

call gbrfs( a, af, ipiv, b, x [,kl] [,trans] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B or AT*X = B or AH*X = B with a band matrix A, with multiple right-hand sides. Foreach computed solution vector x, the routine computes the component-wise backward error


|δaij|/|aij| ≤ β|aij|, |δbi|/|bi| ≤ β|bi| such that (A + δA)x = (b + δb).




• call the factorization routine ?gbtrf

400


• call the solver routine ?gbtrs.

Input Parameters



INTEGER. The number of sub-diagonals within the band of A; kl ≥ 0.kl

INTEGER. The number of super-diagonals within the band of A; ku ≥ 0.ku


REAL for sgbrfsab,afb,b,x,workDOUBLE PRECISION for dgbrfsCOMPLEX for cgbrfsDOUBLE COMPLEX for zgbrfs.Arrays:ab(ldab,*) contains the original band matrix A, as supplied to ?gbtrf,but stored in rows from 1 to kl + ku + 1.afb(ldafb,*) contains the factored band matrix A, as returned by?gbtrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of ab and afb must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.

INTEGER. The first dimension of ab.ldab

INTEGER. The first dimension of afb .ldafb



INTEGER.ipiv

401


Array, DIMENSION at least max(1, n). The ipiv array, as returned by?gbtrf.


REAL for cgbrfsrworkDOUBLE PRECISION for zgbrfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters



INTEGER. If info =0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.



Specific details for the routine gbrfs interface are as follows:

Stands for argument ab in Fortan 77 interface. Holds the array A of size(kl+ku+1,n).

a

Stands for argument afb in Fortan 77 interface. Holds the array AF ofsize (2*kl*ku+1,n).

af








Restored as ku = lda-kl-1.ku

402


Application Notes


For each right-hand side, computation of the backward error involves a minimum of 4n(kl +ku) floating-point operations (for real flavors) or 16n(kl + ku) operations (for complex flavors).In addition, each step of iterative refinement involves 2n(4kl + 3ku) operations (for real flavors)or 8n(4kl + 3ku) operations (for complex flavors); the number of iterations may range from1 to 5. Estimating the forward error involves solving a number of systems of linear equationsA*x = b; the number is usually 4 or 5 and never more than 11. Each solution requiresapproximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.

?gtrfsRefines the solution of a system of linear equationswith a tridiagonal matrix and estimates its error.

Syntax

Fortran 77:

call sgtrfs( trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb, x,ldx, ferr, berr, work, iwork, info )

call dgtrfs( trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb, x,ldx, ferr, berr, work, iwork, info )

call cgtrfs( trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb, x,ldx, ferr, berr, work, rwork, info )

call zgtrfs( trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb, x,ldx, ferr, berr, work, rwork, info )

Fortran 95:

call gtrfs( dl, d, du, dlf, df, duf, du2, ipiv, b, x [,trans] [,ferr] [,berr][,info] )

403


Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B or AT*X = B or AH*X = B with a tridiagonal matrix A, with multiple right-hand sides.For each computed solution vector x, the routine computes the component-wise backward

error β. This error is the smallest relative perturbation in elements of A and b such that x isthe exact solution of the perturbed system:





• call the factorization routine ?gttrf

• call the solver routine ?gttrs.

Input Parameters





nrhs

REAL for sgtrfsdl,d,du,dlf,DOUBLE PRECISION for dgtrfs

df,duf,du2, COMPLEX for cgtrfsDOUBLE COMPLEX for zgtrfs.

b,x,work Arrays:dl, dimension (n -1), contains the subdiagonal elements of A.d, dimension (n), contains the diagonal elements of A.du, dimension (n -1), contains the superdiagonal elements of A.dlf, dimension (n -1), contains the (n - 1) multipliers that define thematrix L from the LU factorization of A as computed by ?gttrf.

404


df, dimension (n), contains the n diagonal elements of the uppertriangular matrix U from the LU factorization of A.duf, dimension (n -1), contains the (n - 1) elements of the firstsuperdiagonal of U.du2, dimension (n -2), contains the (n - 2) elements of the secondsuperdiagonal of U.b(ldb,nrhs) contains the right-hand side matrix B.x(ldx,nrhs) contains the solution matrix X, as computed by ?gttrs.work(*) is a workspace array; the dimension of work must be at leastmax(1, 3*n) for real flavors and max(1, 2*n) for complex flavors.



INTEGER.ipivArray, DIMENSION at least max(1, n). The ipiv array, as returned by?gttrf.


REAL for cgtrfsrworkDOUBLE PRECISION for zgtrfs.Workspace array, DIMENSION (n). Used for complex flavors only.

Output Parameters


REAL for single precision flavorsferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1,nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




Specific details for the routine gtrfs interface are as follows:

405





Holds the vector of length (n-1).dlf

Holds the vector of length (n).df

Holds the vector of length (n-1).duf



Holds the matrix B of size (n,nrhs).b

Holds the matrix X of size (n,nrhs).x




?porfsRefines the solution of a system of linear equationswith a symmetric (Hermitian) positive-definitematrix and estimates its error.

Syntax

Fortran 77:

call sporfs( uplo, n, nrhs, a, lda, af, ldaf, b, ldb, x, ldx, ferr, berr,work, iwork, info )

call dporfs( uplo, n, nrhs, a, lda, af, ldaf, b, ldb, x, ldx, ferr, berr,work, iwork, info )

call cporfs( uplo, n, nrhs, a, lda, af, ldaf, b, ldb, x, ldx, ferr, berr,work, rwork, info )

call zporfs( uplo, n, nrhs, a, lda, af, ldaf, b, ldb, x, ldx, ferr, berr,work, rwork, info )

Fortran 95:

call porfs( a, af, b, x [,uplo] [,ferr] [,berr] [,info] )

406


Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides.For each computed solution vector x, the routine computes the component-wise backward

error β. This error is the smallest relative perturbation in elements of A and b such that x isthe exact solution of the perturbed system:





• call the factorization routine ?potrf

• call the solver routine ?potrs.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', the upper triangle of A is stored.If uplo = 'L', the lower triangle of A is stored.



REAL for sporfsa,af,b,x,workDOUBLE PRECISION for dporfsCOMPLEX for cporfsDOUBLE COMPLEX for zporfs.Arrays:a(lda,*) contains the original matrix A, as supplied to ?potrf.af(ldaf,*) contains the factored matrix A, as returned by ?potrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a and af must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.

407







REAL for cporfsrworkDOUBLE PRECISION for zporfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters






Specific details for the routine porfs interface are as follows:


Holds the matrix AF of size (n,n).af






408


Application Notes



floating-point operations (for real flavors) or 16n2 operations (for complex flavors). In addition,each step of iterative refinement involves 6n2 operations (for real flavors) or 24n2 operations(for complex flavors); the number of iterations may range from 1 to 5. Estimating the forwarderror involves solving a number of systems of linear equations A*x = b; the number is usually4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-pointoperations for real flavors or 8n2 for complex flavors.

?pprfsRefines the solution of a system of linear equationswith a packed symmetric (Hermitian)positive-definite matrix and estimates its error.

Syntax

Fortran 77:

call spprfs( uplo, n, nrhs, ap, afp, b, ldb, x, ldx, ferr, berr, work, iwork,info )

call dpprfs( uplo, n, nrhs, ap, afp, b, ldb, x, ldx, ferr, berr, work, iwork,info )

call cpprfs( uplo, n, nrhs, ap, afp, b, ldb, x, ldx, ferr, berr, work, rwork,info )

call zpprfs( uplo, n, nrhs, ap, afp, b, ldb, x, ldx, ferr, berr, work, rwork,info )

Fortran 95:

call pprfs( a, af, b, x [,uplo] [,ferr] [,berr] [,info] )

409


Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a packed symmetric (Hermitian)positive definite matrix A, with multiple right-handsides. For each computed solution vector x, the routine computes the component-wise

backward error β. This error is the smallest relative perturbation in elements of A and b suchthat x is the exact solution of the perturbed system:





• call the factorization routine ?pptrf

• call the solver routine ?pptrs.

Input Parameters




REAL for spprfsap,afp,b,x,workDOUBLE PRECISION for dpprfsCOMPLEX for cpprfsDOUBLE COMPLEX for zpprfs.Arrays:ap(*) contains the original packed matrix A, as supplied to ?pptrf.afp(*) contains the factored packed matrix A, as returned by ?pptrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.

410


The dimension of arrays ap and afp must be at least max(1,n(n+1)/2);the second dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.




REAL for cpprfsrworkDOUBLE PRECISION for zpprfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters


REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




Specific details for the routine pprfs interface are as follows:


a

Stands for argument apf in Fortan 77 interface. Holds the array AF ofsize (n*(n+1)/2).

af





411



Application Notes



floating-point operations (for real flavors) or 16n2 operations (for complex flavors). In addition,each step of iterative refinement involves 6n2 operations (for real flavors) or 24n2 operations(for complex flavors); the number of iterations may range from 1 to 5.

Estimating the forward error involves solving a number of systems of linear equations A*x =b; the number of systems is usually 4 or 5 and never more than 11. Each solution requiresapproximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.

?pbrfsRefines the solution of a system of linear equationswith a band symmetric (Hermitian) positive-definitematrix and estimates its error.

Syntax

Fortran 77:

call spbrfs( uplo, n, kd, nrhs, ab, ldab, afb, ldafb, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call dpbrfs( uplo, n, kd, nrhs, ab, ldab, afb, ldafb, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call cpbrfs( uplo, n, kd, nrhs, ab, ldab, afb, ldafb, b, ldb, x, ldx, ferr,berr, work, rwork, info )

call zpbrfs( uplo, n, kd, nrhs, ab, ldab, afb, ldafb, b, ldb, x, ldx, ferr,berr, work, rwork, info )

Fortran 95:

call pbrfs( a, af, b, x [,uplo] [,ferr] [,berr] [,info] )

412


Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a symmetric (Hermitian) positive definite band matrix A, with multiple right-handsides. For each computed solution vector x, the routine computes the component-wise

backward error β. This error is the smallest relative perturbation in elements of A and b suchthat x is the exact solution of the perturbed system:





• call the factorization routine ?pbtrf

• call the solver routine ?pbtrs.

Input Parameters




A; kd ≥ 0.

kd


REAL for spbrfsab,afb,b,x,workDOUBLE PRECISION for dpbrfsCOMPLEX for cpbrfsDOUBLE COMPLEX for zpbrfs.Arrays:ab(ldab,*) contains the original band matrix A, as supplied to ?pbtrf.afb(ldafb,*) contains the factored band matrix A, as returned by?pbtrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.

413


The second dimension of ab and afb must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.

INTEGER. The first dimension of ab; ldab ≥ kd + 1.ldab

INTEGER. The first dimension of afb; ldafb ≥ kd + 1.ldafb




REAL for cpbrfsrworkDOUBLE PRECISION for zpbrfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters






Specific details for the routine pbrfs interface are as follows:

Stands for argument ab in Fortan 77 interface. Holds the array A of size(kd+1, n).

a

Stands for argument afb in Fortan 77 interface. Holds the array AF ofsize (kd+1, n).

af



414





Application Notes


For each right-hand side, computation of the backward error involves a minimum of 8n*kdfloating-point operations (for real flavors) or 32n*kd operations (for complex flavors). Inaddition, each step of iterative refinement involves 12n*kd operations (for real flavors) or48n*kd operations (for complex flavors); the number of iterations may range from 1 to 5.

Estimating the forward error involves solving a number of systems of linear equations A*x =b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately4n*kd floating-point operations for real flavors or 16n*kd for complex flavors.

?ptrfsRefines the solution of a system of linear equationswith a symmetric (Hermitian) positive-definitetridiagonal matrix and estimates its error.

Syntax

Fortran 77:

call sptrfs( n, nrhs, d, e, df, ef, b, ldb, x, ldx, ferr, berr, work, info )

call dptrfs( n, nrhs, d, e, df, ef, b, ldb, x, ldx, ferr, berr, work, info )

call cptrfs( uplo, n, nrhs, d, e, df, ef, b, ldb, x, ldx, ferr, berr, work,rwork, info )

call cptrfs( uplo, n, nrhs, d, e, df, ef, b, ldb, x, ldx, ferr, berr, work,rwork, info )

Fortran 95:

call ptrfs( d, df, e, ef, b, x [,ferr] [,berr] [,info] )

call ptrfs( d, df, e, ef, b, x [,uplo] [,ferr] [,berr] [,info] )

415


Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a symmetric (Hermitian) positive definite tridiagonal matrix A, with multipleright-hand sides. For each computed solution vector x, the routine computes the

component-wise backward error β. This error is the smallest relative perturbation in elementsof A and b such that x is the exact solution of the perturbed system:





• call the factorization routine ?pttrf

• call the solver routine ?pttrs.

Input Parameters

CHARACTER*1. Used for complex flavors only. Must be 'U' or 'L'.uploSpecifies whether the superdiagonal or the subdiagonal of the tridiagonalmatrix A is stored and how A is factored:If uplo = 'U', the array e stores the superdiagonal of A, and A isfactored as UH*D*U.If uplo = 'L', the array e stores the subdiagonal of A, and A is factoredas L*D*LH.



REAL for single precision flavors DOUBLE PRECISION for double precisionflavors

d, df, rwork

Arrays: d(n), df(n), rwork(n).The array d contains the n diagonal elements of the tridiagonal matrixA.The array df contains the n diagonal elements of the diagonal matrixD from the factorization of A as computed by ?pttrf.The array rwork is a workspace array used for complex flavors only.

REAL for sptrfse,ef,b,x,workDOUBLE PRECISION for dptrfs

416


COMPLEX for cptrfsDOUBLE COMPLEX for zptrfs.Arrays: e(n -1), ef(n -1), b(ldb,nrhs), x(ldx,nrhs), work(*).The array e contains the (n - 1) off-diagonal elements of the tridiagonalmatrix A (see uplo).The array ef contains the (n - 1) off-diagonal elements of the unitbidiagonal factor U or L from the factorization computed by ?pttrf(see uplo).The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The array x contains the solution matrix X as computed by ?pttrs.The array work is a workspace array. The dimension of work must beat least 2*n for real flavors, and at least n for complex flavors.


INTEGER. The leading dimension of x; ldx ≥ max(1, n).ldx

Output Parameters






Specific details for the routine ptrfs interface are as follows:




417


Holds the vector of length (n-1).ef





Used in complex flavors only. Must be 'U' or 'L'. The default value is'U'.

uplo

?syrfsRefines the solution of a system of linear equationswith a symmetric matrix and estimates its error.

Syntax

Fortran 77:

call ssyrfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call dsyrfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call csyrfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

call zsyrfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

Fortran 95:

call syrfs( a, af, ipiv, b, x [,uplo] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a symmetric full-storage matrix A, with multiple right-hand sides. For each

computed solution vector x, the routine computes the component-wise backward error β.This error is the smallest relative perturbation in elements of A and b such that x is the exactsolution of the perturbed system:


418





• call the factorization routine ?sytrf

• call the solver routine ?sytrs.

Input Parameters




REAL for ssyrfsa,af,b,x,workDOUBLE PRECISION for dsyrfsCOMPLEX for csyrfsDOUBLE COMPLEX for zsyrfs.Arrays:a(lda,*) contains the original matrix A, as supplied to ?sytrf.af(ldaf,*) contains the factored matrix A, as returned by ?sytrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a and af must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 3*n) for real flavors andmax(1, 2*n) for complex flavors.





INTEGER.ipiv

419


Array, DIMENSION at least max(1, n). The ipiv array, as returned by?sytrf.


REAL for csyrfsrworkDOUBLE PRECISION for zsyrfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters






Specific details for the routine syrfs interface are as follows:









Application Notes


420



floating-point operations (for real flavors) or 16n2 operations (for complex flavors). In addition,each step of iterative refinement involves 6n2 operations (for real flavors) or 24n2 operations(for complex flavors); the number of iterations may range from 1 to 5. Estimating the forwarderror involves solving a number of systems of linear equations A*x = b; the number is usually4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-pointoperations for real flavors or 8n2 for complex flavors.

?herfsRefines the solution of a system of linear equationswith a complex Hermitian matrix and estimates itserror.

Syntax

Fortran 77:

call cherfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

call zherfs( uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx, ferr,berr, work, rwork, info )

Fortran 95:

call herfs( a, af, ipiv, b, x [,uplo] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a complex Hermitian full-storage matrix A, with multiple right-hand sides. Foreach computed solution vector x, the routine computes the component-wise backward error






• call the factorization routine ?hetrf

421


• call the solver routine ?hetrs.

Input Parameters




COMPLEX for cherfsa,af,b,x,workDOUBLE COMPLEX for zherfs.Arrays:a(lda,*) contains the original matrix A, as supplied to ?hetrf.af(ldaf,*) contains the factored matrix A, as returned by ?hetrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a and af must be at least max(1, n); thesecond dimension of b and x must be at least max(1, nrhs); thedimension of work must be at least max(1, 2*n).





INTEGER.ipivArray, DIMENSION at least max(1, n). The ipiv array, as returned by?hetrf.

REAL for cherfsrworkDOUBLE PRECISION for zherfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters


REAL for cherfsferr, berr

422


DOUBLE PRECISION for zherfs.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




Specific details for the routine herfs interface are as follows:









Application Notes



operations. In addition, each step of iterative refinement involves 24n2 operations; the numberof iterations may range from 1 to 5.

Estimating the forward error involves solving a number of systems of linear equations A*x =b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately8n2 floating-point operations.

The real counterpart of this routine is ?ssyrfs/?dsyrfs

423


?sprfsRefines the solution of a system of linear equationswith a packed symmetric matrix and estimates thesolution error.

Syntax

Fortran 77:

call ssprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,iwork, info )

call dsprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,iwork, info )

call csprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,rwork, info )

call zsprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,rwork, info )

Fortran 95:

call sprfs( a, af, ipiv, b, x [,uplo] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a packed symmetric matrix A, with multiple right-hand sides. For each computed

solution vector x, the routine computes the component-wise backward error β. This erroris the smallest relative perturbation in elements of A and b such that x is the exact solution ofthe perturbed system:





• call the factorization routine ?sptrf

• call the solver routine ?sptrs.

424


Input Parameters




REAL for ssprfsap,afp,b,x,workDOUBLE PRECISION for dsprfsCOMPLEX for csprfsDOUBLE COMPLEX for zsprfs.Arrays:ap(*) contains the original packed matrix A, as supplied to ?sptrf.afp(*) contains the factored packed matrix A, as returned by ?sptrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The dimension of arrays ap and afp must be at least max(1,n(n+1)/2); the second dimension of b and x must be at least max(1,nrhs); the dimension of work must be at least max(1, 3*n) for realflavors and max(1, 2*n) for complex flavors.





REAL for csprfsrworkDOUBLE PRECISION for zsprfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters


REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.

425


Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




Specific details for the routine sprfs interface are as follows:


a

Stands for argument afp in Fortan 77 interface. Holds the array AF ofsize (n*(n+1)/2).

af







Application Notes



floating-point operations (for real flavors) or 16n2 operations (for complex flavors). In addition,each step of iterative refinement involves 6n2 operations (for real flavors) or 24n2 operations(for complex flavors); the number of iterations may range from 1 to 5.

Estimating the forward error involves solving a number of systems of linear equations A*x =b; the number of systems is usually 4 or 5 and never more than 11. Each solution requiresapproximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.

426


?hprfsRefines the solution of a system of linear equationswith a packed complex Hermitian matrix andestimates the solution error.

Syntax

Fortran 77:

call chprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,rwork, info )

call zhprfs( uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, ferr, berr, work,rwork, info )

Fortran 95:

call hprfs( a, af, ipiv, b, x [,uplo] [,ferr] [,berr] [,info] )

Description

This routine performs an iterative refinement of the solution to a system of linear equationsA*X = B with a packed complex Hermitian matrix A, with multiple right-hand sides. For each






• call the factorization routine ?hptrf

• call the solver routine ?hptrs.

Input Parameters


427




COMPLEX for chprfsap,afp,b,x,workDOUBLE COMPLEX for zhprfs.Arrays:ap(*) contains the original packed matrix A, as supplied to ?hptrf.afp(*) contains the factored packed matrix A, as returned by ?hptrf.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The dimension of arrays ap and afp must be at least max(1,n(n+1)/2);the second dimension of b and x must be at least max(1,nrhs); thedimension of work must be at least max(1, 2*n).



INTEGER.ipivArray, DIMENSION at least max(1, n). The ipiv array, as returned by?hptrf.

REAL for chprfsrworkDOUBLE PRECISION for zhprfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters


REAL for chprfs.ferr, berrDOUBLE PRECISION for zhprfs.Arrays, DIMENSION at least max(1,nrhs). Contain the component-wiseforward and backward errors, respectively, for each solution vector.




428


Specific details for the routine hprfs interface are as follows:


a


af







Application Notes



operations. In addition, each step of iterative refinement involves 24n2 operations; the numberof iterations may range from 1 to 5.

Estimating the forward error involves solving a number of systems of linear equations A*x =b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately8n2 floating-point operations.

The real counterpart of this routine is ?ssprfs/?dsprfs.

429


?trrfsEstimates the error in the solution of a system oflinear equations with a triangular matrix.

Syntax

Fortran 77:

call strrfs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, x, ldx, ferr, berr,work, iwork, info )

call dtrrfs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, x, ldx, ferr, berr,work, iwork, info )

call ctrrfs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, x, ldx, ferr, berr,work, rwork, info )

call ztrrfs( uplo, trans, diag, n, nrhs, a, lda, b, ldb, x, ldx, ferr, berr,work, rwork, info )

Fortran 95:

call trrfs( a, b, x [,uplo] [,trans] [,diag] [,ferr] [,berr] [,info] )

Description

This routine estimates the errors in the solution to a system of linear equations A*X = B orAT*X = B or AH*X = B with a triangular matrix A, with multiple right-hand sides. For each



The routine also estimates the component-wise forward error in the computed solution


Before calling this routine, call the solver routine ?trtrs.

Input Parameters


430




CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', then A is unit triangular: diagonal elements of A areassumed to be 1 and not referenced in the array a.



REAL for strrfsa, b, x, workDOUBLE PRECISION for dtrrfsCOMPLEX for ctrrfsDOUBLE COMPLEX for ztrrfs.Arrays:a(lda,*) contains the upper or lower triangular matrix A, as specifiedby uplo.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a must be at least max(1,n); the seconddimension of b and x must be at least max(1,nrhs); the dimension ofwork must be at least max(1,3*n) for real flavors and max(1,2*n)for complex flavors.





REAL for ctrrfsrworkDOUBLE PRECISION for ztrrfs.Workspace array, DIMENSION at least max(1, n).

431


Output Parameters





Specific details for the routine trrfs interface are as follows:









Application Notes


A call to this routine involves, for each right-hand side, solving a number of systems of linearequations A*x = b; the number of systems is usually 4 or 5 and never more than 11. Eachsolution requires approximately n2 floating-point operations for real flavors or 4n2 for complexflavors.

432


?tprfsEstimates the error in the solution of a system oflinear equations with a packed triangular matrix.

Syntax

Fortran 77:

call stprfs( uplo, trans, diag, n, nrhs, ap, b, ldb, x, ldx, ferr, berr, work,iwork, info )

call dtprfs( uplo, trans, diag, n, nrhs, ap, b, ldb, x, ldx, ferr, berr, work,iwork, info )

call ctprfs( uplo, trans, diag, n, nrhs, ap, b, ldb, x, ldx, ferr, berr, work,rwork, info )

call ztprfs( uplo, trans, diag, n, nrhs, ap, b, ldb, x, ldx, ferr, berr, work,rwork, info )

Fortran 95:

call tprfs( a, b, x [,uplo] [,trans] [,diag] [,ferr] [,berr] [,info] )

Description

This routine estimates the errors in the solution to a system of linear equations A*X = B orAT*X = B or AH*X = B with a packed triangular matrix A, with multiple right-hand sides. Foreach computed solution vector x, the routine computes the component-wise backward error





Before calling this routine, call the solver routine ?tptrs.

Input Parameters


433




CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', A is not a unit triangular matrix.If diag = 'U', A is unit triangular: diagonal elements of A are assumedto be 1 and not referenced in the array ap.



REAL for stprfsap, b, x, workDOUBLE PRECISION for dtprfsCOMPLEX for ctprfsDOUBLE COMPLEX for ztprfs.Arrays:ap(*) contains the upper or lower triangular matrix A, as specified byuplo.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The dimension of ap must be at least max(1,n(n+1)/2); the seconddimension of b and x must be at least max(1,nrhs); the dimension ofwork must be at least max(1,3*n) for real flavors and max(1,2*n)for complex flavors.




REAL for ctprfsrworkDOUBLE PRECISION for ztprfs.Workspace array, DIMENSION at least max(1, n).

434


Output Parameters





Specific details for the routine tprfs interface are as follows:


a








Application Notes


A call to this routine involves, for each right-hand side, solving a number of systems of linearequations A*x = b; the number of systems is usually 4 or 5 and never more than 11. Eachsolution requires approximately n2 floating-point operations for real flavors or 4n2 for complexflavors.

435


?tbrfsEstimates the error in the solution of a system oflinear equations with a triangular band matrix.

Syntax

Fortran 77:

call stbrfs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call dtbrfs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, x, ldx, ferr,berr, work, iwork, info )

call ctbrfs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, x, ldx, ferr,berr, work, rwork, info )

call ztbrfs( uplo, trans, diag, n, kd, nrhs, ab, ldab, b, ldb, x, ldx, ferr,berr, work, rwork, info )

Fortran 95:

call tbrfs( a, b, x [,uplo] [,trans] [,diag] [,ferr] [,berr] [,info] )

Description

This routine estimates the errors in the solution to a system of linear equations A*X = B orAT*X = B or AH*X = B with a triangular band matrix A, with multiple right-hand sides. For each





Before calling this routine, call the solver routine ?tbtrs.

Input Parameters


436




CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', A is not a unit triangular matrix.If diag = 'U', A is unit triangular: diagonal elements of A are assumedto be 1 and not referenced in the array ab.


INTEGER. The number of super-diagonals or sub-diagonals in the matrix

A; kd ≥ 0.

kd


REAL for stbrfsab, b, x, workDOUBLE PRECISION for dtbrfsCOMPLEX for ctbrfsDOUBLE COMPLEX for ztbrfs.Arrays:ab(ldab,*) contains the upper or lower triangular matrix A, as specifiedby uplo, in band storage format.b(ldb,*) contains the right-hand side matrix B.x(ldx,*) contains the solution matrix X.work(*) is a workspace array.The second dimension of a must be at least max(1,n); the seconddimension of b and x must be at least max(1,nrhs). The dimensionof work must be at least max(1,3*n) for real flavors and max(1,2*n)for complex flavors.





REAL for ctbrfsrwork

437


DOUBLE PRECISION for ztbrfs.Workspace array, DIMENSION at least max(1, n).

Output Parameters





Specific details for the routine tbrfs interface are as follows:


a








Application Notes


A call to this routine involves, for each right-hand side, solving a number of systems of linearequations A*x = b; the number of systems is usually 4 or 5 and never more than 11. Eachsolution requires approximately 2n*kd floating-point operations for real flavors or 8n*kdoperations for complex flavors.

438


Routines for Matrix Inversion

It is seldom necessary to compute an explicit inverse of a matrix. In particular, do not attemptto solve a system of equations Ax = b by first computing A-1 and then forming the matrix-vectorproduct x = A-1b. Call a solver routine instead (see Routines for Solving Systems of LinearEquations); this is more efficient and more accurate.

However, matrix inversion routines are provided for the rare occasions when an explicit inversematrix is needed.

?getriComputes the inverse of an LU-factored generalmatrix.

Syntax

Fortran 77:

call sgetri( n, a, lda, ipiv, work, lwork, info )

call dgetri( n, a, lda, ipiv, work, lwork, info )

call cgetri( n, a, lda, ipiv, work, lwork, info )

call zgetri( n, a, lda, ipiv, work, lwork, info )

Fortran 95:

call getri( a, ipiv [,info] )

Description

This routine computes the inverse inv(A) of a general matrix A. Before calling this routine,call ?getrf to factorize A.

Input Parameters


REAL for sgetria, workDOUBLE PRECISION for dgetriCOMPLEX for cgetriDOUBLE COMPLEX for zgetri.

439


Arrays: a(lda,*), work(*).a(lda,*) contains the factorization of the matrix A, as returned by?getrf: A = P*L*U.The second dimension of a must be at least max(1,n).work(*) is a workspace array of dimension at least max(1,lwork).


INTEGER.ipivArray, DIMENSION at least max(1, n).The ipiv array, as returned by ?getrf.

INTEGER. The size of the work array; lwork ≥ n.lwork

If lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla.

See Application Notes below for the suggested value of lwork.

Output Parameters

Overwritten by the n-by-n matrix inv(A).a


work(1)

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of the factor U is zero, U issingular, and the inversion could not be completed.



Specific details for the routine getri interface are as follows:



440


Application Notes






The computed inverse X satisfies the following error bound:

|XA - I| ≤ c(n)ε|X|P|L||U|,

where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identitymatrix; P, L, and U are the factors of the matrix factorization A = P*L*U.


441


?potriComputes the inverse of a symmetric (Hermitian)positive-definite matrix.

Syntax

Fortran 77:

call spotri( uplo, n, a, lda, info )

call dpotri( uplo, n, a, lda, info )

call cpotri( uplo, n, a, lda, info )

call zpotri( uplo, n, a, lda, info )

Fortran 95:

call potri( a [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a symmetric positive definite or, for complexflavors, Hermitian positive-definite matrix A. Before calling this routine, call ?potrf to factorizeA.

Input Parameters



REAL for spotriaDOUBLE PRECISION for dpotriCOMPLEX for cpotriDOUBLE COMPLEX for zpotri.Array a(lda,*). Contains the factorization of the matrix A, as returnedby ?potrf.The second dimension of a must be at least max(1, n).


442


Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of the Cholesky factor (andtherefore the factor itself) is zero, and the inversion could not becompleted.



Specific details for the routine potri interface are as follows:



Application Notes

The computed inverse X satisfies the following error bounds:

||XA - I||2 ≤ c(n)εκ2(A), ||AX - I||2 ≤ c(n)εκ2(A),

where c(n) is a modest linear function of n, and ε is the machine precision; I denotes theidentity matrix.

The 2-norm ||A||2 of a matrix A is defined by ||A||2 = maxx·x=1(Ax·Ax)1/2, and the condition

number κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2.


443


?pptriComputes the inverse of a packed symmetric(Hermitian) positive-definite matrix

Syntax

Fortran 77:

call spptri( uplo, n, ap, info )

call dpptri( uplo, n, ap, info )

call cpptri( uplo, n, ap, info )

call zpptri( uplo, n, ap, info )

Fortran 95:

call pptri( a [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a symmetric positive definite or, for complexflavors, Hermitian positive-definite matrix A in packed form. Before calling this routine, call?pptrf to factorize A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular factor is stored in ap:If uplo = 'U', then the upper triangular factor is stored.If uplo = 'L', then the lower triangular factor is stored.


REAL for spptriapDOUBLE PRECISION for dpptriCOMPLEX for cpptriDOUBLE COMPLEX for zpptri.Array, DIMENSION at least max(1, n(n+1)/2).Contains the factorization of the packed matrix A, as returned by?pptrf.The dimension ap must be at least max(1,n(n+1)/2).

444


Output Parameters

Overwritten by the packed n-by-n matrix inv(A).ap

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of the Cholesky factor (andtherefore the factor itself) is zero, and the inversion could not becompleted.



Specific details for the routine pptri interface are as follows:


a


Application Notes


||XA - I||2 ≤ c(n)εκ2(A), ||AX - I||2 ≤ c(n)εκ2(A),

where c(n) is a modest linear function of n, and ε is the machine precision; I denotes theidentity matrix.

The 2-norm ||A||2 of a matrix A is defined by ||A||2 =maxx·x=1(Ax·Ax)1/2, and the condition

number κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2 .


445


?sytriComputes the inverse of a symmetric matrix.

Syntax

Fortran 77:

call ssytri( uplo, n, a, lda, ipiv, work, info )

call dsytri( uplo, n, a, lda, ipiv, work, info )

call csytri( uplo, n, a, lda, ipiv, work, info )

call zsytri( uplo, n, a, lda, ipiv, work, info )

Fortran 95:

call sytri( a, ipiv [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a symmetric matrix A. Before calling this routine,call ?sytrf to factorize A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array a stores the Bunch-Kaufman factorization A= P*U*D*UT*PT.If uplo = 'L', the array a stores the Bunch-Kaufman factorization A= P*L*D*LT*PT.


REAL for ssytria, workDOUBLE PRECISION for dsytriCOMPLEX for csytriDOUBLE COMPLEX for zsytri.Arrays:a(lda,*) contains the factorization of the matrix A, as returned by?sytrf.The second dimension of a must be at least max(1,n).work(*) is a workspace array.

446


The dimension of work must be at least max(1,2*n).


INTEGER.ipivArray, DIMENSION at least max(1, n).The ipiv array, as returned by ?sytrf.

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info =-i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of D is zero, D is singular, andthe inversion could not be completed.



Specific details for the routine sytri interface are as follows:




Application Notes


|D*UT*PT*X*P*U - I| ≤ c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|D*LT*PT*X*P*L - I| ≤ c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)

for uplo = 'L'. Here c(n) is a modest linear function of n, and ε is the machine precision; Idenotes the identity matrix.


447


?hetriComputes the inverse of a complex Hermitianmatrix.

Syntax

Fortran 77:

call chetri( uplo, n, a, lda, ipiv, work, info )

call zhetri( uplo, n, a, lda, ipiv, work, info )

Fortran 95:

call hetri( a, ipiv [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a complex Hermitian matrix A. Before calling thisroutine, call ?hetrf to factorize A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array a stores the Bunch-Kaufman factorization A= P*U*D*UH*PT.If uplo = 'L', the array a stores the Bunch-Kaufman factorization A= P*L*D*LH*PT.


COMPLEX for chetria, workDOUBLE COMPLEX for zhetri.Arrays:a(lda,*) contains the factorization of the matrix A, as returned by?hetrf.The second dimension of a must be at least max(1,n).work(*) is a workspace array.The dimension of work must be at least max(1,n).


INTEGER.ipiv

448


Array, DIMENSION at least max(1, n). The ipiv array, as returned by?hetrf.

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of D is zero, D is singular, andthe inversion could not be completed.



Specific details for the routine hetri interface are as follows:




Application Notes


|D*UH*PT*X*P*U - I| ≤ c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|D*LH*PT*X*P*L - I| ≤ c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)



The real counterpart of this routine is ?sytri.

449


?sptriComputes the inverse of a symmetric matrix usingpacked storage.

Syntax

Fortran 77:

call ssptri( uplo, n, ap, ipiv, work, info )

call dsptri( uplo, n, ap, ipiv, work, info )

call csptri( uplo, n, ap, ipiv, work, info )

call zsptri( uplo, n, ap, ipiv, work, info )

Fortran 95:

call sptri( a, ipiv [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a packed symmetric matrix A. Before calling thisroutine, call ?sptrf to factorize A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array ap stores the Bunch-Kaufman factorizationA = P*U*D*UT*PT.If uplo = 'L', the array ap stores the Bunch-Kaufman factorizationA = P*L*D*LT*PT.


REAL for ssptriap, workDOUBLE PRECISION for dsptriCOMPLEX for csptriDOUBLE COMPLEX for zsptri.Arrays:ap(*) contains the factorization of the matrix A, as returned by ?sptrf.The dimension of ap must be at least max(1,n(n+1)/2).work(*) is a workspace array.

450


The dimension of work must be at least max(1,n).


Output Parameters

Overwritten by the n-by-n matrix inv(A) in packed form.ap




Specific details for the routine sptri interface are as follows:


a



Application Notes


|D*UT*PT*X*P*U - I| ≤ c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|D*LT*PT*X*P*L - I| ≤ c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)



451


?hptriComputes the inverse of a complex Hermitianmatrix using packed storage.

Syntax

Fortran 77:

call chptri( uplo, n, ap, ipiv, work, info )

call zhptri( uplo, n, ap, ipiv, work, info )

Fortran 95:

call hptri( a, ipiv [,uplo] [,info] )

Description

This routine computes the inverse inv(A) of a complex Hermitian matrix A using packedstorage. Before calling this routine, call ?hptrf to factorize A.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates how the input matrix A has been factored:If uplo = 'U', the array ap stores the packed Bunch-Kaufmanfactorization A = P*U*D*UH*PT.If uplo = 'L', the array ap stores the packed Bunch-Kaufmanfactorization A = P*L*D*LH*PT.


COMPLEX for chptriap, workDOUBLE COMPLEX for zhptri.Arrays:ap(*) contains the factorization of the matrix A, as returned by ?hptrf.The dimension of ap must be at least max(1,n(n+1)/2).work(*) is a workspace array.The dimension of work must be at least max(1,n).

INTEGER.ipivArray, DIMENSION at least max(1, n).The ipiv array, as returned by ?hptrf.

452


Output Parameters

Overwritten by the n-by-n matrix inv(A).ap




Specific details for the routine hptri interface are as follows:


a



Application Notes


|D*UH*PT*X*P*U - I| ≤ c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|D*LH*PT*X*PL - I| ≤ c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)



The real counterpart of this routine is ?sptri.

453


?trtriComputes the inverse of a triangular matrix.

Syntax

Fortran 77:

call strtri( uplo, diag, n, a, lda, info )

call dtrtri( uplo, diag, n, a, lda, info )

call ctrtri( uplo, diag, n, a, lda, info )

call ztrtri( uplo, diag, n, a, lda, info )

Fortran 95:

call trtri( a [,uplo] [,diag] [,info] )

Description

This routine computes the inverse inv(A) of a triangular matrix A.

Input Parameters


CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', A is unit triangular: diagonal elements of A are assumedto be 1 and not referenced in the array a.


REAL for strtriaDOUBLE PRECISION for dtrtriCOMPLEX for ctrtriDOUBLE COMPLEX for ztrtri.Array: DIMENSION (,*).Contains the matrix A.The second dimension of a must be at least max(1,n).

454



Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of A is zero, A is singular, andthe inversion could not be completed.



Specific details for the routine trtri interface are as follows:




Application Notes


|XA - I| ≤ c(n)ε |X||A|

|XA - I| ≤ c(n)ε |A-1||A||X|,

where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identitymatrix.


455


?tptriComputes the inverse of a triangular matrix usingpacked storage.

Syntax

Fortran 77:

call stptri( uplo, diag, n, ap, info )

call dtptri( uplo, diag, n, ap, info )

call ctptri( uplo, diag, n, ap, info )

call ztptri( uplo, diag, n, ap, info )

Fortran 95:

call tptri( a [,uplo] [,diag] [,info] )

Description

This routine computes the inverse inv(A) of a packed triangular matrix A.

Input Parameters


CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then A is not a unit triangular matrix.If diag = 'U', A is unit triangular: diagonal elements of A are assumedto be 1 and not referenced in the array ap.


REAL for stptriapDOUBLE PRECISION for dtptriCOMPLEX for ctptriDOUBLE COMPLEX for ztptri.Array, DIMENSION at least max(1,n(n+1)/2).Contains the packed triangular matrix A.

456


Output Parameters

Overwritten by the packed n-by-n matrix inv(A) .ap

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of A is zero, A is singular, andthe inversion could not be completed.



Specific details for the routine tptri interface are as follows:


a



Application Notes


|XA - I| ≤ c(n)ε |X||A|

|X - A-1| ≤ c(n)ε |A-1||A||X|,

where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identitymatrix.


Routines for Matrix Equilibration

Routines described in this section are used to compute scaling factors needed to equilibrate amatrix. Note that these routines do not actually scale the matrices.

457


?geequComputes row and column scaling factors intendedto equilibrate a matrix and reduce its conditionnumber.

Syntax

Fortran 77:

call sgeequ( m, n, a, lda, r, c, rowcnd, colcnd, amax, info )

call dgeequ( m, n, a, lda, r, c, rowcnd, colcnd, amax, info )

call cgeequ( m, n, a, lda, r, c, rowcnd, colcnd, amax, info )

call zgeequ( m, n, a, lda, r, c, rowcnd, colcnd, amax, info )

Fortran 95:

call geequ( a, r, c [,rowcnd] [,colcnd] [,amax] [,info] )

Description

This routine computes row and column scalings intended to equilibrate an m-by-n matrix A andreduce its condition number. The output array r returns the row scale factors and the array cthe column scale factors. These factors are chosen to try to make the largest element in eachrow and column of the matrix B with elements bij=r(i)*aij*c(j) have absolute value 1.

See ?laqge auxiliary function that uses scaling factors computed by ?geequ.

Input Parameters

INTEGER. The number of rows of the matrix A; m ≥ 0.m

INTEGER. The number of columns of the matrix A; n ≥ 0.n

REAL for sgeequaDOUBLE PRECISION for dgeequCOMPLEX for cgeequDOUBLE COMPLEX for zgeequ.Array: DIMENSION (lda,*).Contains the m-by-n matrix A whose equilibration factors are to becomputed.The second dimension of a must be at least max(1,n).

458


INTEGER. The leading dimension of a; lda ≥ max(1, m).lda

Output Parameters

REAL for single precision flavorsr, cDOUBLE PRECISION for double precision flavors.Arrays: r(m), c(n).If info = 0, or info > m, the array r contains the row scale factorsof the matrix A.If info = 0 , the array c contains the column scale factors of thematrix A.

REAL for single precision flavorsrowcndDOUBLE PRECISION for double precision flavors.If info = 0 or info > m, rowcnd contains the ratio of the smallestr(i) to the largest r(i).

REAL for single precision flavorscolcndDOUBLE PRECISION for double precision flavors.If info = 0, colcnd contains the ratio of the smallest c(i) to thelargest c(i).

REAL for single precision flavorsamaxDOUBLE PRECISION for double precision flavors.Absolute value of the largest element of the matrix A.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i and

i ≤ m, the i-th row of A is exactly zero;i > m, the (i-m)th column of A is exactly zero.



Specific details for the routine geequ interface are as follows:

Holds the matrix A of size (m, n).a

Holds the vector of length (m).r

459


Holds the vector of length (n).c

Application Notes

All the components of r and c are restricted to be between SMLNUM = smallest safe numberand BIGNUM= largest safe number. Use of these scaling factors is not guaranteed to reducethe condition number of A but works well in practice.

If rowcnd ≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.

If colcnd ≥ 0.1, it is not worth scaling by c.

If amax is very close to overflow or very close to underflow, the matrix A should be scaled.

?gbequComputes row and column scaling factors intendedto equilibrate a band matrix and reduce itscondition number.

Syntax

Fortran 77:

call sgbequ( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, info )

call dgbequ( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, info )

call cgbequ( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, info )

call zgbequ( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, info )

Fortran 95:

call gbequ( a, r, c [,kl] [,rowcnd] [,colcnd] [,amax] [,info] )

Description

This routine computes row and column scalings intended to equilibrate an m-by-n band matrixA and reduce its condition number. The output array r returns the row scale factors and thearray c the column scale factors. These factors are chosen to try to make the largest elementin each row and column of the matrix B with elements bij=r(i)*aij*c(j) have absolute value1.

See ?laqgb auxiliary function that uses scaling factors computed by ?gbequ.

460


Input Parameters

INTEGER. The number of rows of the matrix A; m ≥ 0.m

INTEGER. The number of columns of the matrix A; n ≥ 0.n


INTEGER. The number of superdiagonals within the band of A; ku ≥0.

ku

REAL for sgbequabDOUBLE PRECISION for dgbequCOMPLEX for cgbequDOUBLE COMPLEX for zgbequ.Array, DIMENSION (ldab,*).Contains the original band matrix A stored in rows from 1 to kl + ku+ 1.The second dimension of ab must be at least max(1,n);

INTEGER. The leading dimension of ab; ldab ≥ kl+ku+1.ldab

Output Parameters

REAL for single precision flavorsr, cDOUBLE PRECISION for double precision flavors.Arrays: r(m), c(n).If info = 0, or info > m, the array r contains the row scale factorsof the matrix A.If info = 0 , the array c contains the column scale factors of thematrix A.

REAL for single precision flavorsrowcndDOUBLE PRECISION for double precision flavors.If info = 0 or info > m, rowcnd contains the ratio of the smallestr(i) to the largest r(i).

REAL for single precision flavorscolcndDOUBLE PRECISION for double precision flavors.If info = 0, colcnd contains the ratio of the smallest c(i) to thelargest c(i).

REAL for single precision flavorsamaxDOUBLE PRECISION for double precision flavors.

461


Absolute value of the largest element of the matrix A.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i and

i ≤ m, the i-th row of A is exactly zero;i > m, the (i-m)th column of A is exactly zero.



Specific details for the routine gbequ interface are as follows:


a

Holds the vector of length (m).r

Holds the vector of length (n).c



Application Notes

All the components of r and c are restricted to be between SMLNUM = smallest safe numberand BIGNUM= largest safe number. Use of these scaling factors is not guaranteed to reducethe condition number of A but works well in practice.

If rowcnd ≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.

If colcnd ≥ 0.1, it is not worth scaling by c.


462


?poequComputes row and column scaling factors intendedto equilibrate a symmetric (Hermitian) positivedefinite matrix and reduce its condition number.

Syntax

Fortran 77:

call spoequ( n, a, lda, s, scond, amax, info )

call dpoequ( n, a, lda, s, scond, amax, info )

call cpoequ( n, a, lda, s, scond, amax, info )

call zpoequ( n, a, lda, s, scond, amax, info )

Fortran 95:

call poequ( a, s [,scond] [,amax] [,info] )

Description

This routine computes row and column scalings intended to equilibrate a symmetric (Hermitian)positive-definite matrix A and reduce its condition number (with respect to the two-norm). Theoutput array s returns scale factors computed as

These factors are chosen so that the scaled matrix B with elements bij=s(i)*aij*s(j) hasdiagonal elements equal to 1.

This choice of s puts the condition number of B within a factor n of the smallest possible conditionnumber over all possible diagonal scalings.

See ?laqsy auxiliary function that uses scaling factors computed by ?poequ.

Input Parameters


463


REAL for spoequaDOUBLE PRECISION for dpoequCOMPLEX for cpoequDOUBLE COMPLEX for zpoequ.Array: DIMENSION (lda,*).Contains the n-by-n symmetric or Hermitian positive definite matrix Awhose scaling factors are to be computed. Only diagonal elements ofA are referenced.The second dimension of a must be at least max(1,n).

INTEGER. The leading dimension of a; lda ≥ max(1, m).lda

Output Parameters

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.Array, DIMENSION (n).If info = 0, the array s contains the scale factors for A.

REAL for single precision flavorsscondDOUBLE PRECISION for double precision flavors.If info = 0, scond contains the ratio of the smallest s(i) to the largests(i).


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of A is nonpositive.



Specific details for the routine poequ interface are as follows:


Holds the vector of length (n).s

464


Application Notes

If scond ≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.


?ppequComputes row and column scaling factors intendedto equilibrate a symmetric (Hermitian) positivedefinite matrix in packed storage and reduce itscondition number.

Syntax

Fortran 77:

call sppequ( uplo, n, ap, s, scond, amax, info )

call dppequ( uplo, n, ap, s, scond, amax, info )

call cppequ( uplo, n, ap, s, scond, amax, info )

call zppequ( uplo, n, ap, s, scond, amax, info )

Fortran 95:

call ppequ( a, s [,scond] [,amax] [,uplo] [,info] )

Description

This routine computes row and column scalings intended to equilibrate a symmetric (Hermitian)positive definite matrix A in packed storage and reduce its condition number (with respect tothe two-norm). The output array s returns scale factors computed as

These factors are chosen so that the scaled matrix B with elements bij=s(i)*aij*s(j) hasdiagonal elements equal to 1.

465


This choice of s puts the condition number of B within a factor n of the smallest possible conditionnumber over all possible diagonal scalings.

See ?laqsp auxiliary function that uses scaling factors computed by ?ppequ.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is packed inthe array ap:If uplo = 'U', the array ap stores the upper triangular part of thematrix A.If uplo = 'L', the array ap stores the lower triangular part of thematrix A.


REAL for sppequapDOUBLE PRECISION for dppequCOMPLEX for cppequDOUBLE COMPLEX for zppequ.Array, DIMENSION at least max(1,n(n+1)/2). The array ap containsthe upper or the lower triangular part of the matrix A (as specified byuplo) in packed storage (see Matrix Storage Schemes).

Output Parameters




INTEGER.infoIf info = 0, the execution is successful.

466


If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of A is nonpositive.



Specific details for the routine ppequ interface are as follows:


a



Application Notes



?pbequComputes row and column scaling factors intendedto equilibrate a symmetric (Hermitian)positive-definite band matrix and reduce itscondition number.

Syntax

Fortran 77:

call spbequ( uplo, n, kd, ab, ldab, s, scond, amax, info )

call dpbequ( uplo, n, kd, ab, ldab, s, scond, amax, info )

call cpbequ( uplo, n, kd, ab, ldab, s, scond, amax, info )

call zpbequ( uplo, n, kd, ab, ldab, s, scond, amax, info )

Fortran 95:

call pbequ( a, s [,scond] [,amax] [,uplo] [,info] )

467


Description

This routine computes row and column scalings intended to equilibrate a symmetric (Hermitian)positive definite matrix A in packed storage and reduce its condition number (with respect tothe two-norm). The output array s returns scale factors computed as

These factors are chosen so that the scaled matrix B with elements bij=s(i)*aij*s(j) hasdiagonal elements equal to 1. This choice of s puts the condition number of B within a factor nof the smallest possible condition number over all possible diagonal scalings.

See ?laqsb auxiliary function that uses scaling factors computed by ?pbequ.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is packed inthe array ab:If uplo = 'U', the array ab stores the upper triangular part of thematrix A.If uplo = 'L', the array ab stores the lower triangular part of thematrix A.



A; kd ≥ 0.

kd

REAL for spbequabDOUBLE PRECISION for dpbequCOMPLEX for cpbequDOUBLE COMPLEX for zpbequ.Array, DIMENSION (ldab,*).The array ap contains either the upper or the lower triangular part ofthe matrix A (as specified by uplo) in band storage (see MatrixStorage Schemes).The second dimension of ab must be at least max(1, n).

468


INTEGER. The leading dimension of the array ab; ldab ≥ kd +1.ldab

Output Parameters




INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of A is nonpositive.



Specific details for the routine pbequ interface are as follows:


a



Application Notes



469


Driver RoutinesTable 3-3 lists the LAPACK driver routines for solving systems of linear equations with real orcomplex matrices.

Table 3-3 Driver Routines for Solving Systems of Linear Equations

Expert DriverSimple DriverMatrix type, storagescheme

?gesvx?gesvgeneral

?gbsvx?gbsvgeneral band

?gtsvx?gtsvgeneral tridiagonal

?posvx?posvsymmetric/Hermitianpositive-definite

?ppsvx?ppsvsymmetric/Hermitianpositive-definite, storage

?pbsvx?pbsvsymmetric/Hermitianpositive-definite, band

?ptsvx?ptsvsymmetric/Hermitianpositive-definite, tridiagonal

?sysvx/?hesvx?sysv/?hesvsymmetric/Hermitian indefinite

?spsvx/?hpsvx?spsv/?hpsvsymmetric/Hermitianindefinite, packed storage

?sysvx?sysvcomplex symmetric

?spsvx?spsvcomplex symmetric, packedstorage

In this table ? stands for s (single precision real), d (double precision real), c (single precisioncomplex), or z (double precision complex). In the description of ?gesv routine ? stands forcombined character codes ds and zc for the mixed precision subroutines.

470


?gesvComputes the solution to the system of linearequations with a square matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call sgesv( n, nrhs, a, lda, ipiv, b, ldb, info )

call dgesv( n, nrhs, a, lda, ipiv, b, ldb, info )

call cgesv( n, nrhs, a, lda, ipiv, b, ldb, info )

call zgesv( n, nrhs, a, lda, ipiv, b, ldb, info )

call dsgesv( n, nrhs, a, lda, ipiv, b, ldb, x, ldx, work, swork, iter, info)

call zcgesv( n, nrhs, a, lda, ipiv, b, ldb, x, ldx, work, swork, iter, info)

Fortran 95:

call gesv( a, b [,ipiv] [,info] )

Description

This routine solves for X the system of linear equations A*X = B, where A is an n-by-n matrix,the columns of matrix B are individual right-hand sides, and the columns of X are thecorresponding solutions.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A =P*L*U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular.The factored form of A is then used to solve the system of equations A*X = B.

The dsgesv and zcgesv are mixed precision iterative refinement subroutines for exploitingfast single precision hardware. They first attempt to factorize the matrix in single precision(dsgesv) or single complex precision (zcgesv) and use this factorization within an iterativerefinement procedure to produce a solution with double precision (dsgesv) / double complexprecision (zcgesv) normwise backward error quality (see below). If the approach fails themethod switches to a double precision or double complex precision factorization respectivelyand solve.

471


The iterative refinement is not going to be a winning strategy if the ratio single precisionperformance over double precision performance is too small. A reasonable strategy should takethe number of right-hand sides and the size of the matrix into account. This might be donewith a call to ilaenv in the future. At present, iterative refinement is implemented.

The iterative refinement process is stopped if

iter > itermax

or for all the RHS:

RNRM < SQRT(N)*XNRM*ANRM*EPS*BWDMAX,

where

• iter is the number of the current iteration in the iterativerefinement process

• RNRM is the infinity-norm of the residual

• XNRM is the infinity-norm of the solution

• ANRM is the infinity-operator-norm of the matrix A

• EPS is the machine epsilon returned by dlamch (‘Epsilon’).

The values itermax and BWDMAX are fixed to 30 and 1.0D+00 respectively.

Input Parameters

INTEGER. The number of linear equations, that is, the order of the

matrix A; n ≥ 0.

n



nrhs

REAL for sgesva, bDOUBLE PRECISION for dgesv and dsgesvCOMPLEX for cgesvDOUBLE COMPLEX for zgesv and zcgesv.Arrays: a(lda,*), b(ldb,*).The array a contains the n-by-n coefficient matrix A.The array b contains the n-by-nrhs matrix of right hand side matrix B.The second dimension of a must be at least max(1, n), the seconddimension of b at least max(1, nrhs).

472

INTEGER. The leading dimension of the array a; lda ≥ max(1, n).lda

INTEGER. The leading dimension of the array b; ldb ≥ max(1, n).ldb

INTEGER. The leading dimension of the array x; ldx ≥ max(1, n).ldx

DOUBLE PRECISION for dsgesvworkDOUBLE COMPLEX for zcgesv.Workspace array, DIMENSION (n*nrhs). This array is used to hold theresidual vectors.

SINGLE PRECISION for dsgesvsworkCOMPLEX for zcgesv.Workspace array, DIMENSION (n*(n+nrhs)). This array is used to usethe single precision matrix and the right-hand sides or solutions insingle precision.

Output Parameters

Overwritten by the factors L and U from the factorization of A = P*L*U;the unit diagonal elements of L are not stored.

a

If iterative refinement has been successfully used (info= 0 and iter

≥ 0, see description below), then A is unchanged.If double precision factorization has been used (info= 0 and iter <0, see description below), then the array A contains the factors L andU from the factorization A = P*L*U; the unit diagonal elements of Lare not stored.


INTEGER.ipivArray, DIMENSION at least max(1, n). The pivot indices that definethe permutation matrix P; row i of the matrix was interchanged withrow ipiv(i). Corresponds to the single precision factorization (if info=

0 and iter ≥ 0) or the double precision factorization (if info= 0 anditer < 0).

DOUBLE PRECISION for dsgesvxDOUBLE COMPLEX for zcgesv.Array, DIMENSION (ldx, nrhs). If info = 0, contains the n-by-nrhssolution matrix X.

INTEGER.iter

473

If iter < 0: iterative refinement has failed, double precisionfactorization has been performed

• If iter = -1: taking into account machine parameters, n, nrhs, itis a priori not worth working in single precision

• If iter = -2: overflow of an entry when moving from double tosingle precision

• If iter = -3: failure of sgetrf

• If iter = -31: stop the iterative refinement after the 30th iteration.

If iter > 0: iterative refinement has been sucessfully used. Returnsthe number of iterations.

INTEGER. If info=0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, U(i, i) (computed in double precision for mixed precisionsubroutines) is exactly zero. The factorization has been completed, butthe factor U is exactly singular, so the solution could not be computed.



Specific details for the routine gesv interface are as follows:




NOTE. Fortran 95 Interface is so far not available for the mixed precision subroutinesdsgesv/zcgesv.

474

?gesvxComputes the solution to the system of linearequations with a square matrix A and multipleright-hand sides, and provides error bounds on thesolution.

Syntax

Fortran 77:

call sgesvx( fact, trans, n, nrhs, a, lda, af, ldaf, ipiv, equed, r, c, b,ldb, x, ldx, rcond, ferr, berr, work, iwork, info )

call dgesvx( fact, trans, n, nrhs, a, lda, af, ldaf, ipiv, equed, r, c, b,ldb, x, ldx, rcond, ferr, berr, work, iwork, info )

call cgesvx( fact, trans, n, nrhs, a, lda, af, ldaf, ipiv, equed, r, c, b,ldb, x, ldx, rcond, ferr, berr, work, rwork, info )

call zgesvx( fact, trans, n, nrhs, a, lda, af, ldaf, ipiv, equed, r, c, b,ldb, x, ldx, rcond, ferr, berr, work, rwork, info )

Fortran 95:

call gesvx( a, b, x [,af] [,ipiv] [,fact] [,trans] [,equed] [,r] [,c] [,ferr][,berr] [,rcond] [,rpvgrw] [,info] )

Description

This routine uses the LU factorization to compute the solution to a real or complex system oflinear equations A*X = B, where A is an n-by-n matrix, the columns of matrix B are individualright-hand sides, and the columns of X are the corresponding solutions.

Error bounds on the solution and a condition estimate are also provided.

The routine ?gesvx performs the following steps:

1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)*A*diag(c)*inv(diag(c))*X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T*inv(diag(r))*X = diag(c)*B

trans = 'C': (diag(r)*A*diag(c))H*inv(diag(r))*X = diag(c)*B

475


Whether or not the system will be equilibrated depends on the scaling of the matrix A, butif equilibration is used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (iftrans='N') or diag(c)*B (if trans = 'T' or 'C').

2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (afterequilibration if fact = 'E') as A = P*L*U, where P is a permutation matrix, L is a unitlower triangular matrix, and U is upper triangular.

3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i.Otherwise, the factored form of A is used to estimate the condition number of the matrix A.If the reciprocal of the condition number is less than machine precision, info = n + 1 isreturned as a warning, but the routine still goes on to solve for X and compute error boundsas described below.

4. The system of equations is solved for X using the factored form of A.

5. Iterative refinement is applied to improve the computed solution matrix and calculate errorbounds and backward error estimates for it.

6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') ordiag(r) (if trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry, and if not, whether the matrix A should be equilibrated beforeit is factored.If fact = 'F': on entry, af and ipiv contain the factored form of A.If equed is not 'N', the matrix A has been equilibrated with scalingfactors given by r and c.

a, af, and ipiv are not modified.If fact = 'N', the matrix A will be copied to af and factored.If fact = 'E', the matrix A will be equilibrated if necessary, thencopied to af and factored.

CHARACTER*1. Must be 'N', 'T', or 'C'.transSpecifies the form of the system of equations:If trans = 'N', the system has the form A*X = B (No transpose).If trans = 'T', the system has the form AT*X = B (Transpose).If trans = 'C', the system has the form AH*X = B (Conjugatetranspose).

476


INTEGER. The number of linear equations; the order of the matrix A;

n ≥ 0.

n

INTEGER. The number of right hand sides; the number of columns of

the matrices B and X; nrhs ≥ 0.

nrhs

REAL for sgesvxa,af,b,workDOUBLE PRECISION for dgesvxCOMPLEX for cgesvxDOUBLE COMPLEX for zgesvx.Arrays: a(lda,*), af(ldaf,*), b(ldb,*), work(*).The array a contains the matrix A. If fact = 'F' and equed is not 'N',then A must have been equilibrated by the scaling factors in r and/orc. The second dimension of a must be at least max(1,n).The array af is an input argument if fact = 'F'. It contains thefactored form of the matrix A, that is, the factors L and U from thefactorization A = P*L*U as computed by ?getrf. If equed is not 'N',then af is the factored form of the equilibrated matrix A. The seconddimension of af must be at least max(1,n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array. The dimension of work must be at leastmax(1,4*n) for real flavors, and at least max(1,2*n) for complexflavors.




INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains the pivot indices from thefactorization A = P*L*U as computed by ?getrf; row i of the matrixwas interchanged with row ipiv(i).

CHARACTER*1. Must be 'N', 'R', 'C', or 'B'.equedequed is an input argument if fact = 'F'. It specifies the form ofequilibration that was done:

477


If equed = 'N', no equilibration was done (always true if fact ='N').If equed = 'R', row equilibration was done and A has beenpremultiplied by diag(r).If equed = 'C', column equilibration was done and A has beenpostmultiplied by diag(c).If equed = 'B', both row and column equilibration was done; A hasbeen replaced by diag(r)*A*diag(c).

REAL for single precision flavorsr, cDOUBLE PRECISION for double precision flavors.Arrays: r(n), c(n). The array r contains the row scale factors for A,and the array c contains the column scale factors for A. These arraysare input arguments if fact = 'F' only; otherwise they are outputarguments.If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed= 'N' or 'C', r is not accessed.If fact = 'F' and equed = 'R' or 'B', each element of r must bepositive.If equed = 'C' or 'B', A is multiplied on the right by diag(c); ifequed = 'N' or 'R', c is not accessed.If fact = 'F' and equed = 'C' or 'B', each element of c must bepositive.

INTEGER. The first dimension of the output array x; ldx ≥ max(1,n).

ldx

INTEGER. Workspace array, DIMENSION at least max(1, n); used inreal flavors only.

iwork

REAL for single precision flavorsrworkDOUBLE PRECISION for double precision flavors.Workspace array, DIMENSION at least max(1, 2*n); used in complexflavors only.

Output Parameters

REAL for sgesvxxDOUBLE PRECISION for dgesvxCOMPLEX for cgesvxDOUBLE COMPLEX for zgesvx.

478


Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the original system of equations. Note that A and B are modified

on exit if equed ≠ 'N', and the solution to the equilibrated systemis:diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B'; diag(R)-1*X,if trans = 'T' or 'C' and equed = 'R' or 'B'. The second dimensionof x must be at least max(1,nrhs).

Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E'

and equed = 'N'. If equed ≠ 'N', A is scaled on exit as follows:

a

equed = 'R': A = diag(R)*Aequed = 'C': A = A*diag(c)equed = 'B': A = diag(R)*A*diag(c).

If fact = 'N' or 'E', then af is an output argument and on exitreturns the factors L and U from the factorization A = PLU of the originalmatrix A (if fact = 'N') or of the equilibrated matrix A (if fact ='E'). See the description of a for the form of the equilibrated matrix.

af

Overwritten by diag(r)*B if trans = 'N' and equed = 'R' or 'B';boverwritten by diag(c)*B if trans = 'T' and equed = 'C' or 'B';not changed if equed = 'N'.

These arrays are output arguments if fact ≠ 'F'. See the descriptionof r, c in Input Arguments section.

r, c

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A afterequilibration (if done). The routine sets rcond = 0 if the estimateunderflows; in this case the matrix is singular (to working precision).However, anytime rcond is small compared to 1.0, for the workingprecision, the matrix may be poorly conditioned or even singular.

REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and relative backward errors, respectively, for each solutionvector.

479


If fact = 'N' or 'E', then ipiv is an output argument and on exitcontains the pivot indices from the factorization A = P*L*U of theoriginal matrix A (if fact = 'N') or of the equilibrated matrix A (iffact = 'E').

ipiv

If fact ≠ 'F' , then equed is an output argument. It specifies theform of equilibration that was done (see the description of equed inInput Arguments section).

equed

On exit, work(1) for real flavors, or rwork(1) for complex flavors,contains the reciprocal pivot growth factor norm(A)/norm(U). The "maxabsolute element" norm is used. If work(1) for real flavors, or

work, rwork

rwork(1) for complex flavors is much less than 1, then the stabilityof the LU factorization of the (equilibrated) matrix A could be poor. Thisalso means that the solution x, condition estimator rcond, and forwarderror bound ferr could be unreliable. If factorization fails with 0 <

info ≤ n, then work(1) for real flavors, or rwork(1) for complexflavors contains the reciprocal pivot growth factor for the leading infocolumns of A.


If info = i, and i ≤ n, then U(i, i) is exactly zero. The factorizationhas been completed, but the factor U is exactly singular, so the solutionand error bounds could not be computed; rcond = 0 is returned.If info = i, and i = n+1, then U is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine gesvx interface are as follows:



480




Holds the vector of length (n). Default value for each element is r(i)= 1.0_WP.

r

Holds the vector of length (n). Default value for each element is c(i)= 1.0_WP.

c



Must be 'N', 'E', or 'F'. The default value is 'N'. If fact = 'F',then both arguments af and ipiv must be present; otherwise, an erroris returned.

fact


Must be 'N', 'B', 'C', or 'R'. The default value is 'N'.equed

Real value that contains the reciprocal pivot growth factornorm(A)/norm(U).

rpvgrw

?gbsvComputes the solution to the system of linearequations with a band matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call sgbsv( n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call dgbsv( n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call cgbsv( n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

call zgbsv( n, kl, ku, nrhs, ab, ldab, ipiv, b, ldb, info )

Fortran 95:

call gbsv( a, b [,kl] [,ipiv] [,info] )

481


Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n band matrix with kl subdiagonals and ku superdiagonals, the columns of matrix Bare individual right-hand sides, and the columns of X are the corresponding solutions.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L*U,where L is a product of permutation and unit lower triangular matrices with kl subdiagonals,and U is upper triangular with kl+ku superdiagonals. The factored form of A is then used tosolve the system of equations A*X = B.

Input Parameters

INTEGER. The order of A. The number of rows in B; n ≥ 0.n



INTEGER. The number of right-hand sides. The number of columns in

B; nrhs ≥ 0.

nrhs

REAL for sgbsvab, bDOUBLE PRECISION for dgbsvCOMPLEX for cgbsvDOUBLE COMPLEX for zgbsv.Arrays: ab(ldab,*), b(ldb,*).The array ab contains the matrix A in band storage (see Matrix StorageSchemes). The second dimension of ab must be at least max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).

INTEGER. The first dimension of the array ab. (ldab ≥ 2kl + ku +1)ldab


Output Parameters

Overwritten by L and U. The diagonal and kl + ku superdiagonals of Uare stored in the first 1 + kl + ku rows of ab. The multipliers used toform L are stored in the next kl rows.

ab


482


INTEGER.ipivArray, DIMENSION at least max(1, n). The pivot indices: row i wasinterchanged with row ipiv(i).

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, U(i, i) is exactly zero. The factorization has beencompleted, but the factor U is exactly singular, so the solution couldnot be computed.



Specific details for the routine gbsv interface are as follows:


a





483


?gbsvxComputes the solution to the real or complexsystem of linear equations with a band matrix Aand multiple right-hand sides, and provides errorbounds on the solution.

Syntax

Fortran 77:

call sgbsvx( fact, trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, equed,r, c, b, ldb, x, ldx, rcond, ferr, berr, work, iwork, info )

call dgbsvx( fact, trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, equed,r, c, b, ldb, x, ldx, rcond, ferr, berr, work, iwork, info )

call cgbsvx( fact, trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, equed,r, c, b, ldb, x, ldx, rcond, ferr, berr, work, rwork, info )

call zgbsvx( fact, trans, n, kl, ku, nrhs, ab, ldab, afb, ldafb, ipiv, equed,r, c, b, ldb, x, ldx, rcond, ferr, berr, work, rwork, info )

Fortran 95:

call gbsvx( a, b, x [,kl] [,af] [,ipiv] [,fact] [,trans] [,equed] [,r] [,c][,ferr] [,berr] [,rcond] [,rpvgrw] [,info] )

Description

This routine uses the LU factorization to compute the solution to a real or complex system oflinear equations A*X = B, AT*X = B, or AH*X = B, where A is a band matrix of order n with kl

subdiagonals and ku superdiagonals, the columns of matrix B are individual right-hand sides,and the columns of X are the corresponding solutions.


The routine ?gbsvx performs the following steps:

1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)*A*diag(c) *inv(diag(c))*X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T *inv(diag(r))*X = diag(c)*B

trans = 'C': (diag(r)*A*diag(c))H *inv(diag(r))*X = diag(c)*B

484


Whether or not the system will be equilibrated depends on the scaling of the matrix A, butif equilibration is used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (iftrans='N') or diag(c)*B (if trans = 'T' or 'C').

2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibrationif fact = 'E') as A = L*U, where L is a product of permutation and unit lower triangularmatrices with kl subdiagonals, and U is upper triangular with kl+ku superdiagonals.

3. If some Ui,i = 0, so that U is exactly singular, then the routine returns with info = i.Otherwise, the factored form of A is used to estimate the condition number of the matrix A.If the reciprocal of the condition number is less than machine precision, info = n + 1 isreturned as a warning, but the routine still goes on to solve for X and compute error boundsas described below.



6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') ordiag(r) (if trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry, and if not, whether the matrix A should be equilibrated beforeit is factored.If fact = 'F': on entry, afb and ipiv contain the factored form ofA. If equed is not 'N', the matrix A has been equilibrated with scalingfactors given by r and c.

ab, afb, and ipiv are not modified.If fact = 'N', the matrix A will be copied to afb and factored.If fact = 'E', the matrix A will be equilibrated if necessary, thencopied to afb and factored.


485


INTEGER. The number of linear equations, the order of the matrix A;

n ≥ 0.

n



INTEGER. The number of right hand sides, the number of columns of


nrhs

REAL for sgesvxab,afb,b,workDOUBLE PRECISION for dgesvxCOMPLEX for cgesvxDOUBLE COMPLEX for zgesvx.Arrays: a(lda,*), af(ldaf,*), b(ldb,*), work(*).The array ab contains the matrix A in band storage (see Matrix StorageSchemes). The second dimension of ab must be at least max(1, n).If fact = 'F' and equed is not 'N', then A must have beenequilibrated by the scaling factors in r and/or c.The array afb is an input argument if fact = 'F'. The seconddimension of afb must be at least max(1,n). It contains the factoredform of the matrix A, that is, the factors L and U from the factorizationA = L*U as computed by ?gbtrf. U is stored as an upper triangularband matrix with kl + ku superdiagonals in the first 1 + kl + ku rowsof afb. The multipliers used during the factorization are stored in thenext kl rows. If equed is not 'N', then afb is the factored form of theequilibrated matrix A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array. The dimension of work must be at leastmax(1,3*n) for real flavors, and at least max(1,2*n) for complexflavors.

INTEGER. The first dimension of ab; ldab ≥ kl+ku+1.ldab

INTEGER. The first dimension of afb; ldafb ≥ 2*kl+ku+1.ldafb


INTEGER.ipiv

486


Array, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains the pivot indices from thefactorization A = L*U as computed by ?gbtrf; row i of the matrixwas interchanged with row ipiv(i).

CHARACTER*1. Must be 'N', 'R', 'C', or 'B'.equedequed is an input argument if fact = 'F'. It specifies the form ofequilibration that was done:If equed = 'N', no equilibration was done (always true if fact ='N').If equed = 'R', row equilibration was done and A has beenpremultiplied by diag(r).If equed = 'C', column equilibration was done and A has beenpostmultiplied by diag(c).if equed = 'B', both row and column equilibration was done; A hasbeen replaced by diag(r)*A*diag(c).

REAL for single precision flavorsr, cDOUBLE PRECISION for double precision flavors.Arrays: r(n), c(n).The array r contains the row scale factors for A, and the array c containsthe column scale factors for A. These arrays are input arguments iffact = 'F' only; otherwise they are output arguments.If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed= 'N' or 'C', r is not accessed.If fact = 'F' and equed = 'R' or 'B', each element of r must bepositive.If equed = 'C' or 'B', A is multiplied on the right by diag(c); ifequed = 'N' or 'R', c is not accessed.If fact = 'F' and equed = 'C' or 'B', each element of c must bepositive.


ldx


iwork

REAL for single precision flavorsrworkDOUBLE PRECISION for double precision flavors.Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

487


Output Parameters

REAL for sgbsvxxDOUBLE PRECISION for dgbsvxCOMPLEX for cgbsvxDOUBLE COMPLEX for zgbsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the original system of equations. Note that A and B are modified

on exit if equed ≠ 'N', and the solution to the equilibrated systemis: inv(diag(c))*X, if trans = 'N' and equed = 'C' or 'B';inv(diag(r))*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.The second dimension of x must be at least max(1,nrhs).

Array ab is not modified on exit if fact = 'F' or 'N', or if fact ='E' and equed = 'N'.

ab

If equed ≠ 'N', A is scaled on exit as follows:equed = 'R': A = diag(r)*Aequed = 'C': A = A*diag(c)equed = 'B': A = diag(r)*A*diag(c).

If fact = 'N' or 'E', then afb is an output argument and on exitreturns details of the LU factorization of the original matrix A (if fact= 'N') or of the equilibrated matrix A (if fact = 'E'). See thedescription of ab for the form of the equilibrated matrix.

afb

Overwritten by diag(r)*b if trans = 'N' and equed = 'R' or 'B';boverwritten by diag(c)*b if trans = 'T' and equed = 'C' or 'B';not changed if equed = 'N'.

These arrays are output arguments if fact ≠ 'F'. See the descriptionof r, c in Input Arguments section.

r, c

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A afterequilibration (if done).If rcond is less than the machine precision (in particular, if rcond =0),the matrix is singular to working precision. This condition is indicatedby a return code of info>0.

REAL for single precision flavorsferr, berr

488


DOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and relative backward errors, respectively, for each solutionvector.

If fact = 'N' or 'E', then ipiv is an output argument and on exitcontains the pivot indices from the factorization A = L*U of the originalmatrix A (if fact = 'N') or of the equilibrated matrix A (if fact ='E').

ipiv


equed

On exit, work(1) for real flavors, or rwork(1) for complex flavors,contains the reciprocal pivot growth factor norm(A)/norm(U). The "maxabsolute element" norm is used. If work(1) for real flavors, or

work, rwork

rwork(1) for complex flavors is much less than 1, then the stabilityof the LU factorization of the (equilibrated) matrix A could be poor. Thisalso means that the solution x, condition estimator rcond, and forwarderror bound ferr could be unreliable. If factorization fails with 0 <

info ≤ n, then work(1) for real flavors, or rwork(1) for complexflavors contains the reciprocal pivot growth factor for the leading infocolumns of A.


If info = i, and i ≤ n, then U(i, i) is exactly zero. The factorizationhas been completed, but the factor U is exactly singular, so the solutionand error bounds could not be computed; rcond = 0 is returned. Ifinfo = i, and i = n+1, then U is nonsingular, but rcond is less thanmachine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



489

Specific details for the routine gbsvx interface are as follows:


a



Stands for argument ab in Fortan 77 interface. Holds the array AF ofsize (2*kl+ku+1,n).

af


Holds the vector of length (n). Default value for each element is r(i)= 1.0_WP.

r

Holds the vector of length (n). Default value for each element is c(i)= 1.0_WP.

c




Must be 'N', 'B', 'C', or 'R'. The default value is 'N'.equed

Must be 'N', 'E', or 'F'. The default value is 'N'. If fact = 'F',then both arguments af and ipiv must be present; otherwise, an erroris returned.

fact

Real value that contains the reciprocal pivot growth factornorm(A)/norm(U).

rpvgrw



490


?gtsvComputes the solution to the system of linearequations with a tridiagonal matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call sgtsv( n, nrhs, dl, d, du, b, ldb, info )

call dgtsv( n, nrhs, dl, d, du, b, ldb, info )

call cgtsv( n, nrhs, dl, d, du, b, ldb, info )

call zgtsv( n, nrhs, dl, d, du, b, ldb, info )

Fortran 95:

call gtsv( dl, d, du, b [,info] )

Description

This routine solves for X the system of linear equations A*X = B, where A is an n-by-n tridiagonalmatrix, the columns of matrix B are individual right-hand sides, and the columns of X are thecorresponding solutions. The routine uses Gaussian elimination with partial pivoting.

Note that the equation AT*X = B may be solved by interchanging the order of the argumentsdu and dl.

Input Parameters

INTEGER. The order of A, the number of rows in B; n ≥ 0.n

INTEGER. The number of right-hand sides, the number of columns in

B; nrhs ≥ 0.

nrhs

REAL for sgtsvdl, d, du, bDOUBLE PRECISION for dgtsvCOMPLEX for cgtsvDOUBLE COMPLEX for zgtsv.Arrays: dl(n - 1), d(n), du(n - 1), b(ldb,*).The array dl contains the (n - 1) subdiagonal elements of A.The array d contains the diagonal elements of A.

491


The array du contains the (n - 1) superdiagonal elements of A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


Output Parameters

Overwritten by the (n-2) elements of the second superdiagonal of theupper triangular matrix U from the LU factorization of A. These elementsare stored in dl(1), ... , dl(n-2).

dl

Overwritten by the n diagonal elements of U.d

Overwritten by the (n-1) elements of the first superdiagonal of U.du


INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, U(i, i) is exactly zero, and the solution has not beencomputed. The factorization has not been completed unless i = n.



Specific details for the routine gtsv interface are as follows:





492


?gtsvxComputes the solution to the real or complexsystem of linear equations with a tridiagonal matrixA and multiple right-hand sides, and provides errorbounds on the solution.

Syntax

Fortran 77:

call sgtsvx( fact, trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

call dgtsvx( fact, trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

call cgtsvx( fact, trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb,x, ldx, rcond, ferr, berr, work, rwork, info )

call zgtsvx( fact, trans, n, nrhs, dl, d, du, dlf, df, duf, du2, ipiv, b, ldb,x, ldx, rcond, ferr, berr, work, rwork, info )

Fortran 95:

call gtsvx( dl, d, du, b, x [,dlf] [,df] [,duf] [,du2] [,ipiv] [,fact] [,trans][,ferr] [,berr] [,rcond] [,info] )

Description

This routine uses the LU factorization to compute the solution to a real or complex system oflinear equations A*X = B, AT*X = B, or AH*X = B, where A is a tridiagonal matrix of order n,the columns of matrix B are individual right-hand sides, and the columns of X are thecorresponding solutions.


The routine ?gtsvx performs the following steps:

1. If fact = 'N', the LU decomposition is used to factor the matrix A as A = L*U, where Lis a product of permutation and unit lower bidiagonal matrices and U is an upper triangularmatrix with nonzeroes in only the main diagonal and first two superdiagonals.

493


2. If some Ui,i = 0, so that U is exactly singular, then the routine returns with info = i.Otherwise, the factored form of A is used to estimate the condition number of the matrix A.If the reciprocal of the condition number is less than machine precision, info = n + 1 isreturned as a warning, but the routine still goes on to solve for X and compute error boundsas described below.



Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A has beensupplied on entry.If fact = 'F': on entry, dlf, df, duf, du2, and ipiv contain thefactored form of A; arrays dl, d, du, dlf, df, duf, du2, and ipiv willnot be modified.If fact = 'N', the matrix A will be copied to dlf, df, and duf andfactored.


INTEGER. The number of linear equations, the order of the matrix A;

n ≥ 0.

n

INTEGER. The number of right hand sides, the number of columns of


nrhs

REAL for sgtsvxdl,d,du,dlf,df,duf,du2,b, DOUBLE PRECISION for dgtsvx

COMPLEX for cgtsvxDOUBLE COMPLEX for zgtsvx.x,workArrays:dl, dimension (n -1), contains the subdiagonal elements of A.d, dimension (n), contains the diagonal elements of A.du, dimension (n -1), contains the superdiagonal elements of A.

494


dlf, dimension (n -1). If fact = 'F' , then dlf is an input argumentand on entry contains the (n -1) multipliers that define the matrix Lfrom the LU factorization of A as computed by ?gttrf.df, dimension (n). If fact = 'F' , then df is an input argument andon entry contains the n diagonal elements of the upper triangular matrixU from the LU factorization of A.duf, dimension (n -1). If fact = 'F' , then duf is an input argumentand on entry contains the (n -1) elements of the first superdiagonal ofU.du2, dimension (n -2). If fact = 'F' , then du2 is an input argumentand on entry contains the (n-2) elements of the second superdiagonalof U.b(ldb*) contains the right-hand side matrix B. The second dimensionof b must be at least max(1, nrhs).x(ldx*) contains the solution matrix X. The second dimension of xmust be at least max(1, nrhs).work(*) is a workspace array;the dimension of work must be at least max(1, 3*n) for real flavorsand max(1, 2*n) for complex flavors.



INTEGER.ipivArray, DIMENSION at least max(1, n). If fact = 'F' , then ipiv isan input argument and on entry contains the pivot indices, as returnedby ?gttrf.


REAL for cgtsvxrworkDOUBLE PRECISION for zgtsvx.Workspace array, DIMENSION (n). Used for complex flavors only.

Output Parameters

REAL for sgtsvxxDOUBLE PRECISION for dgtsvxCOMPLEX for cgtsvxDOUBLE COMPLEX for zgtsvx.Array, DIMENSION (ldx,*).

495


If info = 0 or info = n+1, the array x contains the solution matrixX. The second dimension of x must be at least max(1, nrhs).

If fact = 'N' , then dlf is an output argument and on exit containsthe (n-1) multipliers that define the matrix L from the LU factorizationof A.

dlf

If fact = 'N' , then df is an output argument and on exit containsthe n diagonal elements of the upper triangular matrix U from the LUfactorization of A.

df

If fact = 'N' , then duf is an output argument and on exit containsthe (n-1) elements of the first superdiagonal of U.

duf

If fact = 'N' , then du2 is an output argument and on exit containsthe (n-2) elements of the second superdiagonal of U.

du2

The array ipiv is an output argument if fact = 'N' and, on exit,contains the pivot indices from the factorization A = L*U ; row i ofthe matrix was interchanged with row ipiv(i). The value of ipiv(i)will always be i or i+1; ipiv(i)=i indicates a row interchange wasnot required.

ipiv

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A. If rcondis less than the machine precision (in particular, if rcond =0), the matrixis singular to working precision. This condition is indicated by a returncode of info>0.



If info = i, and i ≤ n, then U(i, i) is exactly zero. The factorizationhas not been completed unless i = n, but the factor U is exactlysingular, so the solution and error bounds could not be computed;rcond = 0 is returned. If info = i, and i = n + 1, then U isnonsingular, but rcond is less than machine precision, meaning thatthe matrix is singular to working precision. Nevertheless, the solution

496


and error bounds are computed because there are a number ofsituations where the computed solution can be more accurate than thevalue of rcond would suggest.



Specific details for the routine gtsvx interface are as follows:






Holds the vector of length (n-1).dlf


Holds the vector of length (n-1).duf





Must be 'N' or 'F'. The default value is 'N'. If fact = 'F', then thearguments dlf, df, duf, du2, and ipiv must be present; otherwise,an error is returned.

fact


497


?posvComputes the solution to the system of linearequations with a symmetric or Hermitianpositive-definite matrix A and multiple right-handsides.

Syntax

Fortran 77:

call sposv( uplo, n, nrhs, a, lda, b, ldb, info )

call dposv( uplo, n, nrhs, a, lda, b, ldb, info )

call cposv( uplo, n, nrhs, a, lda, b, ldb, info )

call zposv( uplo, n, nrhs, a, lda, b, ldb, info )

Fortran 95:

call posv( a, b [,uplo] [,info] )

Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n symmetric/Hermitian positive-definite matrix, the columns of matrix B are individualright-hand sides, and the columns of X are the corresponding solutions.

The Cholesky decomposition is used to factor A as

A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'

or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix. The factored form ofA is then used to solve the system of equations A*X = B.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored:If uplo = 'U', the upper triangle of A is stored.If uplo = 'L', the lower triangle of A is stored.


498



B; nrhs ≥ 0.

nrhs

REAL for sposva, bDOUBLE PRECISION for dposvCOMPLEX for cposvDOUBLE COMPLEX for zposv.Arrays: a(lda,*), b(ldb,*). The array a contains the upper or thelower triangular part of the matrix A (see uplo). The second dimensionof a must be at least max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).



Output Parameters

If info = 0, the upper or lower triangular part of a is overwritten bythe Cholesky factor U or L, as specified by uplo.

a


INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive definite, so the factorization could not becompleted, and the solution has not been computed.



Specific details for the routine posv interface are as follows:




499


?posvxUses the Cholesky factorization to compute thesolution to the system of linear equations with asymmetric or Hermitian positive definite matrix A,and provides error bounds on the solution.

Syntax

Fortran 77:

call sposvx( fact, uplo, n, nrhs, a, lda, af, ldaf, equed, s, b, ldb, x, ldx,rcond, ferr, berr, work, iwork, info )

call dposvx( fact, uplo, n, nrhs, a, lda, af, ldaf, equed, s, b, ldb, x, ldx,rcond, ferr, berr, work, iwork, info )

call cposvx( fact, uplo, n, nrhs, a, lda, af, ldaf, equed, s, b, ldb, x, ldx,rcond, ferr, berr, work, rwork, info )

call zposvx( fact, uplo, n, nrhs, a, lda, af, ldaf, equed, s, b, ldb, x, ldx,rcond, ferr, berr, work, rwork, info )

Fortran 95:

call posvx( a, b, x [,uplo] [,af] [,fact] [,equed] [,s] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors)or A=L*LT (real flavors) / A=L*LH (complex flavors) to compute the solution to a real or complexsystem of linear equations A*X = B, where A is a n-by-n real symmetric/Hermitian positivedefinite matrix, the columns of matrix B are individual right-hand sides, and the columns of Xare the corresponding solutions.


The routine ?posvx performs the following steps:

1. If fact = 'E', real scaling factors s are computed to equilibrate the system:

diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.

Whether or not the system will be equilibrated depends on the scaling of the matrix A, butif equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.

500


2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (afterequilibration if fact = 'E') as

A = UT*U (real), A = UH*U (complex), if uplo = 'U',

or A = L*LT (real), A = L*LH (complex) , if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix.

3. If the leading i-by-i principal minor is not positive-definite, then the routine returns withinfo = i. Otherwise, the factored form of A is used to estimate the condition number ofthe matrix A. If the reciprocal of the condition number is less than machine precision, info= n + 1 is returned as a warning, but the routine still goes on to solve for X and computeerror bounds as described below.



6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves theoriginal system before equilibration.

Input Parameters

CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry, and if not, whether the matrix A should be equilibrated beforeit is factored.If fact = 'F': on entry, af contains the factored form of A. If equed= 'Y', the matrix A has been equilibrated with scaling factors givenby s.

a and af will not be modified.If fact = 'N', the matrix A will be copied to af and factored.If fact = 'E', the matrix A will be equilibrated if necessary, thencopied to af and factored.



501



B; nrhs ≥ 0.

nrhs

REAL for sposvxa,af,b,workDOUBLE PRECISION for dposvxCOMPLEX for cposvxDOUBLE COMPLEX for zposvx.Arrays: a(lda,*), af(ldaf,*), b(ldb,*), work(*).The array a contains the matrix A as specified by uplo. If fact = 'F'and equed = 'Y', then A must have been equilibrated by the scalingfactors in s, and a must contain the equilibrated matrixdiag(s)*A*diag(s). The second dimension of a must be at leastmax(1,n).The array af is an input argument if fact = 'F'. It contains thetriangular factor U or L from the Cholesky factorization of A in the samestorage format as A. If equed is not 'N', then af is the factored formof the equilibrated matrix diag(s)*A*diag(s). The second dimensionof af must be at least max(1,n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array. The dimension of work must be at leastmax(1,3*n) for real flavors, and at least max(1,2*n) for complexflavors.




CHARACTER*1. Must be 'N' or 'Y'.equedequed is an input argument if fact = 'F'. It specifies the form ofequilibration that was done:if equed = 'N', no equilibration was done (always true if fact ='N');if equed = 'Y', equilibration was done and A has been replaced bydiag(s)*A*diag(s).

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.

502


Array, DIMENSION (n). The array s contains the scale factors for A. Thisarray is an input argument if fact = 'F' only; otherwise it is an outputargument.If equed = 'N', s is not accessed.If fact = 'F' and equed = 'Y', each element of s must be positive.


ldx


iwork

REAL for cposvxrworkDOUBLE PRECISION for zposvx.Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

Output Parameters

REAL for sposvxxDOUBLE PRECISION for dposvxCOMPLEX for cposvxDOUBLE COMPLEX for zposvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the original system of equations. Note that if equed = 'Y', A andB are modified on exit, and the solution to the equilibrated system isinv(diag(s))*X. The second dimension of x must be at leastmax(1,nrhs).

Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E'and equed = 'N'.

a

If fact = 'E' and equed = 'Y', A is overwritten bydiag(s)*A*diag(s).

If fact = 'N' or 'E', then af is an output argument and on exitreturns the triangular factor U or L from the Cholesky factorizationA=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex routines)

af

of the original matrix A (if fact = 'N'), or of the equilibrated matrixA (if fact = 'E'). See the description of a for the form of theequilibrated matrix.

503


Overwritten by diag(s)*B , if equed = 'Y'; not changed if equed ='N'.

b

This array is an output argument if fact ≠ 'F'. See the descriptionof s in Input Arguments section.

s

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A afterequilibration (if done). If rcond is less than the machine precision (inparticular, if rcond =0), the matrix is singular to working precision.This condition is indicated by a return code of info>0.

REAL for single precision flavorsferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and relative backward errors, respectively, for each solutionvector.


equed


If info = i, and i ≤ n, the leading minor of order i (and thereforethe matrix A itself) is not positive-definite, so the factorization couldnot be completed, and the solution and error bounds could not becomputed; rcond =0 is returned.If info = i, and i = n + 1, then U is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine posvx interface are as follows:

504






Holds the vector of length (n). Default value for each element is s(i)= 1.0_WP.

s




Must be 'N', 'E', or 'F'. The default value is 'N'. If fact = 'F',then af must be present; otherwise, an error is returned.

fact

Must be 'N' or 'Y'. The default value is 'N'.equed

?ppsvComputes the solution to the system of linearequations with a symmetric (Hermitian) positivedefinite packed matrix A and multiple right-handsides.

Syntax

Fortran 77:

call sppsv( uplo, n, nrhs, ap, b, ldb, info )

call dppsv( uplo, n, nrhs, ap, b, ldb, info )

call cppsv( uplo, n, nrhs, ap, b, ldb, info )

call zppsv( uplo, n, nrhs, ap, b, ldb, info )

Fortran 95:

call ppsv( a, b [,uplo] [,info] )

505


Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n real symmetric/Hermitian positive-definite matrix stored in packed format, thecolumns of matrix B are individual right-hand sides, and the columns of X are the correspondingsolutions.




where U is an upper triangular matrix and L is a lower triangular matrix. The factored form ofA is then used to solve the system of equations A*X = B.

Input Parameters




B; nrhs ≥ 0.

nrhs

REAL for sppsvap, bDOUBLE PRECISION for dppsvCOMPLEX for cppsvDOUBLE COMPLEX for zppsv.Arrays: ap(*), b(ldb,*). The array ap contains the upper or the lowertriangular part of the matrix A (as specified by uplo) in packedstorage (see Matrix Storage Schemes). The dimension of ap must beat least max(1,n(n+1)/2).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


506


Output Parameters

If info = 0, the upper or lower triangular part of A in packed storageis overwritten by the Cholesky factor U or L, as specified by uplo.

ap


INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive-definite, so the factorization could not becompleted, and the solution has not been computed.



Specific details for the routine ppsv interface are as follows:


a



507


?ppsvxUses the Cholesky factorization to compute thesolution to the system of linear equations with asymmetric (Hermitian) positive definite packedmatrix A, and provides error bounds on thesolution.

Syntax

Fortran 77:

call sppsvx( fact, uplo, n, nrhs, ap, afp, equed, s, b, ldb, x, ldx, rcond,ferr, berr, work, iwork, info )

call dppsvx( fact, uplo, n, nrhs, ap, afp, equed, s, b, ldb, x, ldx, rcond,ferr, berr, work, iwork, info )

call cppsvx( fact, uplo, n, nrhs, ap, afp, equed, s, b, ldb, x, ldx, rcond,ferr, berr, work, rwork, info )

call zppsvx( fact, uplo, n, nrhs, ap, afp, equed, s, b, ldb, x, ldx, rcond,ferr, berr, work, rwork, info )

Fortran 95:

call ppsvx( a, b, x [,uplo] [,af] [,fact] [,equed] [,s] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors)or A=L*LT (real flavors) / A=L*LH (complex flavors) to compute the solution to a real or complexsystem of linear equations A*X = B, where A is a n-by-n symmetric or Hermitian positive-definitematrix stored in packed format, the columns of matrix B are individual right-hand sides, andthe columns of X are the corresponding solutions.


The routine ?ppsvx performs the following steps:



508







3. If the leading i-by-i principal minor is not positive-definite, then the routine returns withinfo = i. Otherwise, the factored form of A is used to estimate the condition number ofthe matrix A. If the reciprocal of the condition number is less than machine precision, info= n+1 is returned as a warning, but the routine still goes on to solve for X and computeerror bounds as described below.




Input Parameters

CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry, and if not, whether the matrix A should be equilibrated beforeit is factored.If fact = 'F': on entry, afp contains the factored form of A. If equed= 'Y', the matrix A has been equilibrated with scaling factors givenby s.

ap and afp will not be modified.If fact = 'N', the matrix A will be copied to afp and factored.If fact = 'E', the matrix A will be equilibrated if necessary, thencopied to afp and factored.

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored:If uplo = 'U', the upper triangle of A is stored.

509




INTEGER. The number of right-hand sides; the number of columns in

B; nrhs ≥ 0.

nrhs

REAL for sppsvxap,afp,b,workDOUBLE PRECISION for dppsvxCOMPLEX for cppsvxDOUBLE COMPLEX for zppsvx.Arrays: ap(*), afp(*), b(ldb,*), work(*).The array ap contains the upper or lower triangle of the originalsymmetric/Hermitian matrix A in packed storage (see Matrix StorageSchemes). In case when fact = 'F' and equed = 'Y', ap mustcontain the equilibrated matrix diag(s)*A*diag(s).The array afp is an input argument if fact = 'F' and contains thetriangular factor U or L from the Cholesky factorization of A in the samestorage format as A. If equed is not 'N', then afp is the factored formof the equilibrated matrix A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.work(*) is a workspace array.The dimension of arrays ap and afp must be at least max(1,n(n+1)/2); the second dimension of b must be at least max(1,nrhs);the dimension of work must be at least max(1, 3*n) for real flavorsand max(1, 2*n) for complex flavors.


CHARACTER*1. Must be 'N' or 'Y'.equedequed is an input argument if fact = 'F'. It specifies the form ofequilibration that was done:if equed = 'N', no equilibration was done (always true if fact ='N');if equed = 'Y', equilibration was done and A has been replaced bydiag(s)A*diag(s).

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.

510


Array, DIMENSION (n). The array s contains the scale factors for A. Thisarray is an input argument if fact = 'F' only; otherwise it is an outputargument.If equed = 'N', s is not accessed.If fact = 'F' and equed = 'Y', each element of s must be positive.


ldx


iwork

REAL for cppsvx;rworkDOUBLE PRECISION for zppsvx.Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

Output Parameters

REAL for sppsvxxDOUBLE PRECISION for dppsvxCOMPLEX for cppsvxDOUBLE COMPLEX for zppsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the original system of equations. Note that if equed = 'Y', A andB are modified on exit, and the solution to the equilibrated system isinv(diag(s))*X. The second dimension of x must be at leastmax(1,nrhs).

Array ap is not modified on exit if fact = 'F' or 'N', or if fact ='E' and equed = 'N'.

ap

If fact = 'E' and equed = 'Y', A is overwritten bydiag(s)*A*diag(s).

If fact = 'N' or 'E', then afp is an output argument and on exitreturns the triangular factor U or L from the Cholesky factorizationA=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex routines)

afp

of the original matrix A (if fact = 'N'), or of the equilibrated matrixA (if fact = 'E'). See the description of ap for the form of theequilibrated matrix.

511



b


s

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A afterequilibration (if done). If rcond is less than the machine precision (inparticular, if rcond = 0), the matrix is singular to working precision.This condition is indicated by a return code of info > 0.



equed


If info = i, and i ≤ n, the leading minor of order i (and thereforethe matrix A itself) is not positive-definite, so the factorization couldnot be completed, and the solution and error bounds could not becomputed; rcond = 0 is returned.If info = i, and i = n + 1, then U is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine ppsvx interface are as follows:

512



a



Stands for argument afp in Fortan 77 interface. Holds the matrix AFof size (n*(n+1)/2).

af


s





fact


?pbsvComputes the solution to the system of linearequations with a symmetric or Hermitianpositive-definite band matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call spbsv( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call dpbsv( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call cpbsv( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

call zpbsv( uplo, n, kd, nrhs, ab, ldab, b, ldb, info )

Fortran 95:

call pbsv( a, b [,uplo] [,info] )

513


Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n symmetric/Hermitian positive definite band matrix, the columns of matrix B areindividual right-hand sides, and the columns of X are the corresponding solutions.




where U is an upper triangular band matrix and L is a lower triangular band matrix, with thesame number of superdiagonals or subdiagonals as A. The factored form of A is then used tosolve the system of equations A*X = B.

Input Parameters



INTEGER. The number of superdiagonals of the matrix A if uplo = 'U',

or the number of subdiagonals if uplo = 'L'; kd ≥ 0.

kd


B; nrhs ≥ 0.

nrhs

REAL for spbsvab, bDOUBLE PRECISION for dpbsvCOMPLEX for cpbsvDOUBLE COMPLEX for zpbsv.Arrays: ab(ldab, *), b(ldb,*). The array ab contains the upper orthe lower triangular part of the matrix A (as specified by uplo) in bandstorage (see Matrix Storage Schemes). The second dimension of abmust be at least max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


514



Output Parameters

The upper or lower triangular part of A (in band storage) is overwrittenby the Cholesky factor U or L, as specified by uplo, in the same storageformat as A.

ab


INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive-definite, so the factorization could not becompleted, and the solution has not been computed.



Specific details for the routine pbsv interface are as follows:


a



515


?pbsvxUses the Cholesky factorization to compute thesolution to the system of linear equations with asymmetric (Hermitian) positive-definite bandmatrix A, and provides error bounds on thesolution.

Syntax

Fortran 77:

call spbsvx( fact, uplo, n, kd, nrhs, ab, ldab, afb, ldafb, equed, s, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

call dpbsvx( fact, uplo, n, kd, nrhs, ab, ldab, afb, ldafb, equed, s, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

call cpbsvx( fact, uplo, n, kd, nrhs, ab, ldab, afb, ldafb, equed, s, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

call zpbsvx( fact, uplo, n, kd, nrhs, ab, ldab, afb, ldafb, equed, s, b, ldb,x, ldx, rcond, ferr, berr, work, iwork, info )

Fortran 95:

call pbsvx( a, b, x [,uplo] [,af] [,fact] [,equed] [,s] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors)or A=L*LT (real flavors) / A=L*LH (complex flavors) to compute the solution to a real or complexsystem of linear equations A*X = B, where A is a n-by-n symmetric or Hermitian positivedefinite band matrix, the columns of matrix B are individual right-hand sides, and the columnsof X are the corresponding solutions.


The routine ?pbsvx performs the following steps:



516






where U is an upper triangular band matrix and L is a lower triangular band matrix.

3. If the leading i-by-i principal minor is not positive definite, then the routine returns withinfo = i. Otherwise, the factored form of A is used to estimate the condition number ofthe matrix A. If the reciprocal of the condition number is less than machine precision, info= n+1 is returned as a warning, but the routine still goes on to solve for X and computeerror bounds as described below.




Input Parameters

CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry, and if not, whether the matrix A should be equilibrated beforeit is factored.If fact = 'F': on entry, afb contains the factored form of A. If equed= 'Y', the matrix A has been equilibrated with scaling factors givenby s.

ab and afb will not be modified.If fact = 'N', the matrix A will be copied to afb and factored.If fact = 'E', the matrix A will be equilibrated if necessary, thencopied to afb and factored.


517





A; kd ≥ 0.

kd


B; nrhs ≥ 0.

nrhs

REAL for spbsvxab,afb,b,workDOUBLE PRECISION for dpbsvxCOMPLEX for cpbsvxDOUBLE COMPLEX for zpbsvx.Arrays: ab(ldab,*), afb(ldab,*), b(ldb,*), work(*).The array ab contains the upper or lower triangle of the matrix A inband storage (see Matrix Storage Schemes).If fact = 'F' and equed = 'Y', then ab must contain the equilibratedmatrix diag(s)*A*diag(s). The second dimension of ab must be atleast max(1, n).The array afb is an input argument if fact = 'F'. It contains thetriangular factor U or L from the Cholesky factorization of the bandmatrix A in the same storage format as A. If equed = 'Y', then afbis the factored form of the equilibrated matrix A. The second dimensionof afb must be at least max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array.The dimension of work must be at least max(1,3*n) for real flavors,and at least max(1,2*n) for complex flavors.

INTEGER. The first dimension of ab; ldab ≥ kd+1.ldab

INTEGER. The first dimension of afb; ldafb ≥ kd+1.ldafb


CHARACTER*1. Must be 'N' or 'Y'.equedequed is an input argument if fact = 'F'. It specifies the form ofequilibration that was done:if equed = 'N', no equilibration was done (always true if fact = 'N')

518


if equed = 'Y', equilibration was done and A has been replaced bydiag(s)*A*diag(s).

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.Array, DIMENSION (n). The array s contains the scale factors for A. Thisarray is an input argument if fact = 'F' only; otherwise it is an outputargument.If equed = 'N', s is not accessed.If fact = 'F' and equed = 'Y', each element of s must be positive.


ldx


iwork

REAL for cpbsvxrworkDOUBLE PRECISION for zpbsvx.Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

Output Parameters

REAL for spbsvxxDOUBLE PRECISION for dpbsvxCOMPLEX for cpbsvxDOUBLE COMPLEX for zpbsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrix Xto the original system of equations. Note that if equed = 'Y', A andB are modified on exit, and the solution to the equilibrated system isinv(diag(s))*X. The second dimension of x must be at leastmax(1,nrhs).

On exit, if fact = 'E' and equed = 'Y', A is overwritten bydiag(s)*A*diag(s).

ab

If fact = 'N' or 'E', then afb is an output argument and on exitreturns the triangular factor U or L from the Cholesky factorizationA=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex routines)

afb

519


of the original matrix A (if fact = 'N'), or of the equilibrated matrixA (if fact = 'E'). See the description of ab for the form of theequilibrated matrix.


b


s




equed


If info = i, and i ≤ n, the leading minor of order i (and thereforethe matrix A itself) is not positive definite, so the factorization couldnot be completed, and the solution and error bounds could not becomputed; rcond =0 is returned. If info = i, and i = n + 1, then Uis nonsingular, but rcond is less than machine precision, meaning thatthe matrix is singular to working precision. Nevertheless, the solutionand error bounds are computed because there are a number ofsituations where the computed solution can be more accurate than thevalue of rcond would suggest.

520




Specific details for the routine pbsvx interface are as follows:


a



Stands for argument afb in Fortan 77 interface. Holds the array AF ofsize (kd+1,n).

af


s





fact


?ptsvComputes the solution to the system of linearequations with a symmetric or Hermitian positivedefinite tridiagonal matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call sptsv( n, nrhs, d, e, b, ldb, info )

call dptsv( n, nrhs, d, e, b, ldb, info )

call cptsv( n, nrhs, d, e, b, ldb, info )

call zptsv( n, nrhs, d, e, b, ldb, info )

521


Fortran 95:

call ptsv( d, e, b [,info] )

Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n symmetric/Hermitian positive-definite tridiagonal matrix, the columns of matrix Bare individual right-hand sides, and the columns of X are the corresponding solutions.

A is factored as A = L*D*LT (real flavors) or A = L*D*LH (complex flavors), and the factoredform of A is then used to solve the system of equations A*X = B.

Input Parameters



B; nrhs ≥ 0.

nrhs

REAL for single precision flavorsdDOUBLE PRECISION for double precision flavors.Array, dimension at least max(1, n). Contains the diagonal elementsof the tridiagonal matrix A.

REAL for sptsve, bDOUBLE PRECISION for dptsvCOMPLEX for cptsvDOUBLE COMPLEX for zptsv.Arrays: e(n - 1) , b(ldb,*). The array e contains the (n - 1)subdiagonal elements of A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


Output Parameters

Overwritten by the n diagonal elements of the diagonal matrix D fromthe L*D*LT (real)/ L*D*LH (complex) factorization of A.

d

Overwritten by the (n - 1) subdiagonal elements of the unit bidiagonalfactor L from the factorization of A.

e

522



INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, the leading minor of order i (and therefore the matrix Aitself) is not positive-definite, and the solution has not been computed.The factorization has not been completed unless i = n.



Specific details for the routine ptsv interface are as follows:




?ptsvxUses factorization to compute the solution to thesystem of linear equations with a symmetric(Hermitian) positive definite tridiagonal matrix A,and provides error bounds on the solution.

Syntax

Fortran 77:

call sptsvx( fact, n, nrhs, d, e, df, ef, b, ldb, x, ldx, rcond, ferr, berr,work, info )

call dptsvx( fact, n, nrhs, d, e, df, ef, b, ldb, x, ldx, rcond, ferr, berr,work, info )

call cptsvx( fact, n, nrhs, d, e, df, ef, b, ldb, x, ldx, rcond, ferr, berr,work, rwork, info )

call zptsvx( fact, n, nrhs, d, e, df, ef, b, ldb, x, ldx, rcond, ferr, berr,work, rwork, info )

523


Fortran 95:

call ptsvx( d, e, b, x [,df] [,ef] [,fact] [,ferr] [,berr] [,rcond] [,info] )

Description

This routine uses the Cholesky factorization A = L*D*LT (real)/A = L*D*LH (complex) tocompute the solution to a real or complex system of linear equations A*X = B, where A is an-by-n symmetric or Hermitian positive definite tridiagonal matrix, the columns of matrix B areindividual right-hand sides, and the columns of X are the corresponding solutions.


The routine ?ptsvx performs the following steps:

1. If fact = 'N', the matrix A is factored as A = L*D*LT (real flavors)/A = L*D*LH (complexflavors), where L is a unit lower bidiagonal matrix and D is diagonal. The factorization canalso be regarded as having the form A = UT*D*U (real flavors)/A = UH*D*U (complexflavors).

2. If the leading i-by-i principal minor is not positive-definite, then the routine returns withinfo = i. Otherwise, the factored form of A is used to estimate the condition number ofthe matrix A. If the reciprocal of the condition number is less than machine precision, info= n+1 is returned as a warning, but the routine still goes on to solve for X and computeerror bounds as described below.



Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A is suppliedon entry.If fact = 'F': on entry, df and ef contain the factored form of A.Arrays d, e, df, and ef will not be modified.If fact = 'N', the matrix A will be copied to df and ef , and factored.



B; nrhs ≥ 0.

nrhs

524


REAL for single precision flavorsd, df, rworkDOUBLE PRECISION for double precision flavors.Arrays: d(n), df(n), rwork(n).The array d contains the n diagonal elements of the tridiagonal matrixA.The array df is an input argument if fact = 'F' and on entry containsthe n diagonal elements of the diagonal matrix D from the L*D*LT (real)/L*D*LH (complex) factorization of A.The array rwork is a workspace array used for complex flavors only.

REAL for sptsvxe,ef,b,workDOUBLE PRECISION for dptsvxCOMPLEX for cptsvxDOUBLE COMPLEX for zptsvx.Arrays: e(n -1), ef(n -1), b(ldb*), work(*). The array e containsthe (n - 1) subdiagonal elements of the tridiagonal matrix A.The array ef is an input argument if fact = 'F' and on entry containsthe (n - 1) subdiagonal elements of the unit bidiagonal factor L fromthe L*D*LT (real)/ L*D*LH (complex) factorization of A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.The array work is a workspace array. The dimension of work must beat least 2*n for real flavors, and at least n for complex flavors.


INTEGER. The leading dimension of x; ldx ≥ max(1, n).ldx

Output Parameters

REAL for sptsvxxDOUBLE PRECISION for dptsvxCOMPLEX for cptsvxDOUBLE COMPLEX for zptsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the system of equations. The second dimension of x must be atleast max(1,nrhs).

These arrays are output arguments if fact = 'N'. See the descriptionof df, ef in Input Arguments section.

df, ef

525





If info = i, and i ≤ n, the leading minor of order i (and thereforethe matrix A itself) is not positive-definite, so the factorization couldnot be completed, and the solution and error bounds could not becomputed; rcond =0 is returned.If info = i, and i = n + 1, then U is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine ptsvx interface are as follows:






Holds the vector of length (n-1).ef


526



Must be 'N' or 'F'. The default value is 'N'. If fact = 'F', thenboth arguments af and ipiv must be present; otherwise, an error isreturned.

fact

?sysvComputes the solution to the system of linearequations with a real or complex symmetric matrixA and multiple right-hand sides.

Syntax

Fortran 77:

call ssysv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

call dsysv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

call csysv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

call zsysv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

Fortran 95:

call sysv( a, b [,uplo] [,ipiv] [,info] )

Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n symmetric matrix, the columns of matrix B are individual right-hand sides, and thecolumns of X are the corresponding solutions.

The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT , where U (orL) is a product of permutation and unit upper (lower) triangular matrices, and D is symmetricand block diagonal with 1-by-1 and 2-by-2 diagonal blocks.

The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters


527





B; nrhs ≥ 0.

nrhs

REAL for ssysva, b, workDOUBLE PRECISION for dsysvCOMPLEX for csysvDOUBLE COMPLEX for zsysv.Arrays: a(lda,*), b(ldb,*), work(*).The array a contains the upper or the lower triangular part of thesymmetric matrix A (see uplo). The second dimension of a must be atleast max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).work is a workspace array, dimension at least max(1,lwork).



INTEGER. The size of the work array; lwork ≥ 1.lwork

If lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla. See Application Notes below for details and for thesuggested value of lwork.

Output Parameters

If info = 0, a is overwritten by the block-diagonal matrix D and themultipliers used to obtain the factor U (or L) from the factorization ofA as computed by ?sytrf.

a

If info = 0, b is overwritten by the solution matrix X.b

INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D, as determined by ?sytrf.

528


If ipiv(i) = k >0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) = ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) = ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


work(1)

INTEGER. If info = 0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, dii is 0. The factorization has been completed, but D isexactly singular, so the solution could not be computed.



Specific details for the routine sysv interface are as follows:





Application Notes




529



?sysvxUses the diagonal pivoting factorization to computethe solution to the system of linear equations witha real or complex symmetric matrix A, and provideserror bounds on the solution.

Syntax

Fortran 77:

call ssysvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, iwork, info )

call dsysvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, iwork, info )

call csysvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, rwork, info )

call zsysvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, rwork, info )

Fortran 95:

call sysvx( a, b, x [,uplo] [,af] [,ipiv] [,fact] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the diagonal pivoting factorization to compute the solution to a real or complexsystem of linear equations A*X = B, where A is a n-by-n symmetric matrix, the columns ofmatrix B are individual right-hand sides, and the columns of X are the corresponding solutions.


The routine ?sysvx performs the following steps:

530


1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of thefactorization is A = U*D*UT or A = L*D*LT, where U (or L) is a product of permutation andunit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1and 2-by-2 diagonal blocks.

2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i.Otherwise, the factored form of A is used to estimate the condition number of the matrix A.If the reciprocal of the condition number is less than machine precision, info = n+1 isreturned as a warning, but the routine still goes on to solve for X and compute error boundsas described below.



Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A has beensupplied on entry.If fact = 'F': on entry, af and ipiv contain the factored form of A.Arrays a, af, and ipiv will not be modified.If fact = 'N', the matrix A will be copied to af and factored.




B; nrhs ≥ 0.

nrhs

REAL for ssysvxa,af,b,workDOUBLE PRECISION for dsysvxCOMPLEX for csysvxDOUBLE COMPLEX for zsysvx.Arrays: a(lda,*), af(ldaf,*), b(ldb,*), work(*).The array a contains the upper or the lower triangular part of thesymmetric matrix A (see uplo). The second dimension of a must be atleast max(1,n).

531


The array af is an input argument if fact = 'F'. It contains he blockdiagonal matrix D and the multipliers used to obtain the factor U or Lfrom the factorization A = U*D*UT orA = L*D*LT as computed by?sytrf. The second dimension of af must be at least max(1,n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array, dimension at least max(1,lwork).




INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains details of the interchanges andthe block structure of D, as determined by ?sytrf.If ipiv(i) = k > 0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) = ipiv(i-1) = -m < 0, then D has a2-by-2 block in rows/columns i and i-1, and (i-1)-th row and columnof A was interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) = ipiv(i+1) = -m < 0, then D has a2-by-2 block in rows/columns i and i+1, and (i+1)-th row and columnof A was interchanged with the m-th row and column.

INTEGER. The leading dimension of the output array x; ldx ≥ max(1,n).

ldx

INTEGER. The size of the work array.lworkIf lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla. See Application Notes below for details and for thesuggested value of lwork.


iwork

REAL for csysvx;rworkDOUBLE PRECISION for zsysvx.

532

Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

Output Parameters

REAL for ssysvxxDOUBLE PRECISION for dsysvxCOMPLEX for csysvxDOUBLE COMPLEX for zsysvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the system of equations. The second dimension of x must be atleast max(1,nrhs).

These arrays are output arguments if fact = 'N'.af, ipivSee the description of af, ipiv in Input Arguments section.

REAL for single precision flavorsrcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A. If rcondis less than the machine precision (in particular, if rcond = 0), thematrix is singular to working precision. This condition is indicated by areturn code of info > 0.


If info=0, on exit work(1) contains the minimum value of lworkrequired for optimum performance. Use this lwork for subsequent runs.

work(1)


If info = i, and i ≤ n, then dii is exactly zero. The factorization hasbeen completed, but the block diagonal matrix D is exactly singular, sothe solution and error bounds could not be computed; rcond = 0 isreturned.

533


If info = i, and i = n + 1, then D is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine sysvx interface are as follows:










fact

Application Notes

For real flavors, lwork must be at least 3*n, and for complex flavors at least 2*n. For betterperformance, try using lwork = n*blocksize, where blocksize is the optimal block size for?sytrf.



534




?hesvComputes the solution to the system of linearequations with a Hermitian matrix A and multipleright-hand sides.

Syntax

Fortran 77:

call chesv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

call zhesv( uplo, n, nrhs, a, lda, ipiv, b, ldb, work, lwork, info )

Fortran 95:

call hesv( a, b [,uplo] [,ipiv] [,info] )

Description

This routine solves for X the complex system of linear equations A*X = B, where A is an n-by-nsymmetric matrix, the columns of matrix B are individual right-hand sides, and the columns ofX are the corresponding solutions.

The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH , where U (orL) is a product of permutation and unit upper (lower) triangular matrices, and D is Hermitianand block diagonal with 1-by-1 and 2-by-2 diagonal blocks.


Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:

535


If uplo = 'U', the array a stores the upper triangular part of thematrix A, and A is factored as U*D*UH.If uplo = 'L', the array a stores the lower triangular part of the matrixA, and A is factored as L*D*LH.



B; nrhs ≥ 0.

nrhs

COMPLEX for chesva, b, workDOUBLE COMPLEX for zhesv.Arrays: a(lda,*), b(ldb,*), work(*). The array a contains the upperor the lower triangular part of the Hermitian matrix A (see uplo). Thesecond dimension of a must be at least max(1, n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).work is a workspace array, dimension at least max(1,lwork).



INTEGER. The size of the work array (lwork ≥ 1).lwork

If lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla. See Application Notes below for details and for thesuggested value of lwork.

Output Parameters

If info = 0, a is overwritten by the block-diagonal matrix D and themultipliers used to obtain the factor U (or L) from the factorization ofA as computed by ?hetrf.

a


INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D, as determined by ?hetrf.

536


If ipiv(i) = k > 0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


work(1)




Specific details for the routine hesv interface are as follows:





Application Notes




537



?hesvxUses the diagonal pivoting factorization to computethe solution to the complex system of linearequations with a Hermitian matrix A, and provideserror bounds on the solution.

Syntax

Fortran 77:

call chesvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, rwork, info )

call zhesvx( fact, uplo, n, nrhs, a, lda, af, ldaf, ipiv, b, ldb, x, ldx,rcond, ferr, berr, work, lwork, rwork, info )

Fortran 95:

call hesvx( a, b, x [,uplo] [,af] [,ipiv] [,fact] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the diagonal pivoting factorization to compute the solution to a complexsystem of linear equations A*X = B, where A is an n-by-n Hermitian matrix, the columns ofmatrix B are individual right-hand sides, and the columns of X are the corresponding solutions.


The routine ?hesvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of thefactorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation andunit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1and 2-by-2 diagonal blocks.

538


2. If some di,i = 0, so that D is exactly singular, then the routine returns with info = i.Otherwise, the factored form of A is used to estimate the condition number of the matrix A.If the reciprocal of the condition number is less than machine precision, info = n+1 isreturned as a warning, but the routine still goes on to solve for X and compute error boundsas described below.



Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A has beensupplied on entry.If fact = 'F': on entry, af and ipiv contain the factored form of A.Arrays a, af, and ipiv are not modified.If fact = 'N', the matrix A is copied to af and factored.

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array a stores the upper triangular part of theHermitian matrix A, and A is factored as U*D*UH.If uplo = 'L', the array a stores the lower triangular part of theHermitian matrix A; A is factored as L*D*LH.



B; nrhs ≥ 0.

nrhs

COMPLEX for chesvxa,af,b,workDOUBLE COMPLEX for zhesvx.Arrays: a(lda,*), af(ldaf,*), b(ldb,*), work(*).The array a contains the upper or the lower triangular part of theHermitian matrix A (see uplo). The second dimension of a must be atleast max(1,n).

539


The array af is an input argument if fact = 'F'. It contains he blockdiagonal matrix D and the multipliers used to obtain the factor U or Lfrom the factorization A = U*D*UH or A = L*D*LH as computed by?hetrf. The second dimension of af must be at least max(1,n).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1, nrhs).work(*) is a workspace array of dimension at least max(1, lwork).




INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains details of the interchanges andthe block structure of D, as determined by ?hetrf.If ipiv(i) = k > 0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


ldx

INTEGER. The size of the work array.lworkIf lwork = -1, then a workspace query is assumed; the routine onlycalculates the optimal size of the work array, returns this value as thefirst entry of the work array, and no error message related to lwork isissued by xerbla. See Application Notes below for details and for thesuggested value of lwork.

REAL for chesvxrworkDOUBLE PRECISION for zhesvx.Workspace array, DIMENSION at least max(1, n).

540

Output Parameters

COMPLEX for chesvxxDOUBLE COMPLEX for zhesvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the system of equations. The second dimension of x must be atleast max(1,nrhs).

These arrays are output arguments if fact = 'N'. See the descriptionof af, ipiv in Input Arguments section.

af, ipiv

REAL for chesvxrcondDOUBLE PRECISION for zhesvx.An estimate of the reciprocal condition number of the matrix A. If rcondis less than the machine precision (in particular, if rcond = 0), thematrix is singular to working precision. This condition is indicated by areturn code of info > 0.

REAL for chesvxferr, berrDOUBLE PRECISION for zhesvx.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and relative backward errors, respectively, for each solutionvector.


work(1)


If info = i, and i ≤ n, then dii is exactly zero. The factorization hasbeen completed, but the block diagonal matrix D is exactly singular, sothe solution and error bounds could not be computed; rcond = 0 isreturned.If info = i, and i = n + 1, then D is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.

541




Specific details for the routine hesvx interface are as follows:










fact

Application Notes

The value of lwork must be at least 2*n. For better performance, try using lwork =n*blocksize, where blocksize is the optimal block size for ?hetrf.





542


?spsvComputes the solution to the system of linearequations with a real or complex symmetric matrixA stored in packed format, and multiple right-handsides.

Syntax

Fortran 77:

call sspsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call dspsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call cspsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call zspsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

Fortran 95:

call spsv( a, b [,uplo] [,ipiv] [,info] )

Description

This routine solves for X the real or complex system of linear equations A*X = B, where A isan n-by-n symmetric matrix stored in packed format, the columns of matrix B are individualright-hand sides, and the columns of X are the corresponding solutions.

The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT , where U (orL) is a product of permutation and unit upper (lower) triangular matrices, and D is symmetricand block diagonal with 1-by-1 and 2-by-2 diagonal blocks.


Input Parameters




B; nrhs ≥ 0.

nrhs

543


REAL for sspsvap, bDOUBLE PRECISION for dspsvCOMPLEX for cspsvDOUBLE COMPLEX for zspsv.Arrays: ap(*), b(ldb,*).The dimension of ap must be at least max(1,n(n+1)/2). The array apcontains the factor U or L, as specified by uplo, in packed storage(see Matrix Storage Schemes).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


Output Parameters

The block-diagonal matrix D and the multipliers used to obtain the factorU (or L) from the factorization of A as computed by ?sptrf, stored asa packed triangular matrix in the same storage format as A.

ap


INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D, as determined by ?sptrf.If ipiv(i) = k > 0, then dii is a 1-by-1 block, and the i-th row andcolumn of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


544



Specific details for the routine spsv interface are as follows:


a




?spsvxUses the diagonal pivoting factorization to computethe solution to the system of linear equations witha real or complex symmetric matrix A stored inpacked format, and provides error bounds on thesolution.

Syntax

Fortran 77:

call sspsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, iwork, info )

call dspsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, iwork, info )

call cspsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, rwork, info )

call zspsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, rwork, info )

Fortran 95:

call spsvx( a, b, x [,uplo] [,af] [,ipiv] [,fact] [,ferr] [,berr] [,rcond][,info] )

545


Description

This routine uses the diagonal pivoting factorization to compute the solution to a real or complexsystem of linear equations A*X = B, where A is a n-by-n symmetric matrix stored in packedformat, the columns of matrix B are individual right-hand sides, and the columns of X are thecorresponding solutions.


The routine ?spsvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form ofthe factorization is A = U*D*UT orA = L*D*LT, where U (or L) is a product of permutationand unit upper (lower) triangular matrices, and D is symmetric and block diagonal with1-by-1 and 2-by-2 diagonal blocks.




Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A has beensupplied on entry.If fact = 'F': on entry, afp and ipiv contain the factored form ofA. Arrays ap, afp, and ipiv are not modified.If fact = 'N', the matrix A is copied to afp and factored.

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array ap stores the upper triangular part of thesymmetric matrix A, and A is factored as U*D*UT.If uplo = 'L', the array ap stores the lower triangular part of thesymmetric matrix A; A is factored as L*D*LT.

546


B; nrhs ≥ 0.

nrhs

REAL for sspsvxap,afp,b,workDOUBLE PRECISION for dspsvxCOMPLEX for cspsvxDOUBLE COMPLEX for zspsvx.Arrays: ap(*), afp(*), b(ldb,*), work(*).The array ap contains the upper or lower triangle of the symmetricmatrix A in packed storage (see Matrix Storage Schemes).The array afp is an input argument if fact = 'F'. It contains the blockdiagonal matrix D and the multipliers used to obtain the factor U or Lfrom the factorization A = U*D*UT or A = L*D*LT as computed by?sptrf, in the same storage format as A.The array b contains the matrix B whose columns are the right-handsides for the systems of equations.work(*) is a workspace array.The dimension of arrays ap and afp must be at least max(1,n(n+1)/2); the second dimension of b must be at least max(1,nrhs);the dimension of work must be at least max(1,3*n) for real flavorsand max(1,2*n) for complex flavors.


INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains details of the interchanges andthe block structure of D, as determined by ?sptrf.If ipiv(i) = k > 0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


ldx

547


iwork

REAL for cspsvxrworkDOUBLE PRECISION for zspsvx.Workspace array, DIMENSION at least max(1, n); used in complexflavors only.

Output Parameters

REAL for sspsvxxDOUBLE PRECISION for dspsvxCOMPLEX for cspsvxDOUBLE COMPLEX for zspsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the system of equations. The second dimension of x must be atleast max(1,nrhs).

These arrays are output arguments if fact = 'N'. See the descriptionof afp, ipiv in Input Arguments section.

afp, ipiv

REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrix A. If rcondis less than the machine precision (in particular, if rcond = 0), thematrix is singular to working precision. This condition is indicated by areturn code of info > 0.



If info = i, and i ≤ n, then dii is exactly zero. The factorization hasbeen completed, but the block diagonal matrix D is exactly singular, sothe solution and error bounds could not be computed; rcond = 0 isreturned.

548


If info = i, and i = n + 1, then D is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine spsvx interface are as follows:


a




af






fact

549


?hpsvComputes the solution to the system of linearequations with a Hermitian matrix A stored inpacked format, and multiple right-hand sides.

Syntax

Fortran 77:

call chpsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

call zhpsv( uplo, n, nrhs, ap, ipiv, b, ldb, info )

Fortran 95:

call hpsv( a, b [,uplo] [,ipiv] [,info] )

Description

This routine solves for X the system of linear equations A*X = B, where A is an n-by-n Hermitianmatrix stored in packed format, the columns of matrix B are individual right-hand sides, andthe columns of X are the corresponding solutions.

The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (orL) is a product of permutation and unit upper (lower) triangular matrices, and D is Hermitianand block diagonal with 1-by-1 and 2-by-2 diagonal blocks.


Input Parameters




B; nrhs ≥ 0.

nrhs

COMPLEX for chpsvap, bDOUBLE COMPLEX for zhpsv.Arrays: ap(*), b(ldb,*).

550


The dimension of ap must be at least max(1,n(n+1)/2). The array apcontains the factor U or L, as specified by uplo, in packed storage(see Matrix Storage Schemes).The array b contains the matrix B whose columns are the right-handsides for the systems of equations. The second dimension of b mustbe at least max(1,nrhs).


Output Parameters

The block-diagonal matrix D and the multipliers used to obtain the factorU (or L) from the factorization of A as computed by ?hptrf, stored asa packed triangular matrix in the same storage format as A.

ap


INTEGER.ipivArray, DIMENSION at least max(1, n). Contains details of theinterchanges and the block structure of D, as determined by ?hptrf.If ipiv(i) = k > 0, then dii is a 1-by-1 block, and the i-th row andcolumn of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.




Specific details for the routine hpsv interface are as follows:

551


a




?hpsvxUses the diagonal pivoting factorization to computethe solution to the system of linear equations witha Hermitian matrix A stored in packed format, andprovides error bounds on the solution.

Syntax

Fortran 77:

call chpsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, rwork, info )

call zhpsvx( fact, uplo, n, nrhs, ap, afp, ipiv, b, ldb, x, ldx, rcond, ferr,berr, work, rwork, info )

Fortran 95:

call hpsvx( a, b, x [,uplo] [,af] [,ipiv] [,fact] [,ferr] [,berr] [,rcond][,info] )

Description

This routine uses the diagonal pivoting factorization to compute the solution to a complexsystem of linear equations A*X = B, where A is a n-by-n Hermitian matrix stored in packedformat, the columns of matrix B are individual right-hand sides, and the columns of X are thecorresponding solutions.


The routine ?hpsvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of thefactorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation andunit upper (lower) triangular matrices, and D is a Hermitian and block diagonal with 1-by-1and 2-by-2 diagonal blocks.

552





Input Parameters

CHARACTER*1. Must be 'F' or 'N'.factSpecifies whether or not the factored form of the matrix A has beensupplied on entry.If fact = 'F': on entry, afp and ipiv contain the factored form ofA. Arrays ap, afp, and ipiv are not modified.If fact = 'N', the matrix A is copied to afp and factored.

CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A is stored andhow A is factored:If uplo = 'U', the array ap stores the upper triangular part of theHermitian matrix A, and A is factored as U*D*UH.If uplo = 'L', the array ap stores the lower triangular part of theHermitian matrix A, and A is factored as L*D*LH.



B; nrhs ≥ 0.

nrhs

COMPLEX for chpsvxap,afp,b,workDOUBLE COMPLEX for zhpsvx.Arrays: ap(*), afp(*), b(ldb,*), work(*).The array ap contains the upper or lower triangle of the Hermitianmatrix A in packed storage (see Matrix Storage Schemes).The array afp is an input argument if fact = 'F'. It contains the blockdiagonal matrix D and the multipliers used to obtain the factor U or Lfrom the factorization A = U*D*UH or A = L*D*LH as computed by?hptrf, in the same storage format as A.

553


The array b contains the matrix B whose columns are the right-handsides for the systems of equations.work(*) is a workspace array.The dimension of arrays ap and afp must be at least max(1,n(n+1)/2);the second dimension of b must be at least max(1,nrhs); the dimensionof work must be at least max(1,2*n).


INTEGER.ipivArray, DIMENSION at least max(1, n). The array ipiv is an inputargument if fact = 'F'. It contains details of the interchanges andthe block structure of D, as determined by ?hptrf.If ipiv(i) = k > 0, then dii is a 1-by-1 diagonal block, and the i-throw and column of A was interchanged with the k-th row and column.If uplo = 'U' and ipiv(i) =ipiv(i-1) = -m < 0, then D has a 2-by-2block in rows/columns i and i-1, and (i-1)-th row and column of Awas interchanged with the m-th row and column.If uplo = 'L' and ipiv(i) =ipiv(i+1) = -m < 0, then D has a 2-by-2block in rows/columns i and i+1, and (i+1)-th row and column of Awas interchanged with the m-th row and column.


ldx

REAL for chpsvxrworkDOUBLE PRECISION for zhpsvx.Workspace array, DIMENSION at least max(1, n).

Output Parameters

COMPLEX for chpsvxxDOUBLE COMPLEX for zhpsvx.Array, DIMENSION (ldx,*).If info = 0 or info = n+1, the array x contains the solution matrixX to the system of equations. The second dimension of x must be atleast max(1,nrhs).

These arrays are output arguments if fact = 'N'. See the descriptionof afp, ipiv in Input Arguments section.

afp, ipiv

REAL for chpsvxrcond

554

DOUBLE PRECISION for zhpsvx.An estimate of the reciprocal condition number of the matrix A. If rcondis less than the machine precision (in particular, if rcond = 0), thematrix is singular to working precision. This condition is indicated by areturn code of info > 0.

REAL for chpsvxferr, berrDOUBLE PRECISION for zhpsvx.Arrays, DIMENSION at least max(1, nrhs). Contain the component-wiseforward and relative backward errors, respectively, for each solutionvector.


If info = i, and i ≤ n, then dii is exactly zero. The factorization hasbeen completed, but the block diagonal matrix D is exactly singular, sothe solution and error bounds could not be computed; rcond = 0 isreturned.If info = i, and i = n + 1, then D is nonsingular, but rcond is lessthan machine precision, meaning that the matrix is singular to workingprecision. Nevertheless, the solution and error bounds are computedbecause there are a number of situations where the computed solutioncan be more accurate than the value of rcond would suggest.



Specific details for the routine hpsvx interface are as follows:


a



Stands for argument ap in Fortan 77 interface. Holds the array AF ofsize (n*(n+1)/2).

af



555





fact

556


4LAPACK Routines: Least Squaresand Eigenvalue Problems

This chapter describes the Intel® Math Kernel Library implementation of routines from the LAPACK packagethat are used for solving linear least-squares problems, eigenvalue and singular value problems, as wellas performing a number of related computational tasks.

Sections in this chapter include descriptions of LAPACK computational routines and driver routines. Forfull reference on LAPACK routines and related information see [LUG].

Least-Squares Problems. A typical least-squares problem is as follows: given a matrix A and a

vector b, find the vector x that minimizes the sum of squares σi((Ax)i - bi)2 or, equivalently, find the

vector x that minimizes the 2-norm ||Ax - b||2.

In the most usual case, A is an m-by-n matrix with m < n and rank(A) = n. This problem is also referredto as finding the least-squares solution to an overdetermined system of linear equations (here wehave more equations than unknowns). To solve this problem, you can use the QR factorization of thematrix A (see QR Factorization).

If m < n and rank(A) = m, there exist an infinite number of solutions x which exactly satisfy Ax = b,and thus minimize the norm ||Ax - b||2. In this case it is often useful to find the unique solution thatminimizes ||x||2. This problem is referred to as finding the minimum-norm solution to anunderdetermined system of linear equations (here we have more unknowns than equations). To solvethis problem, you can use the LQ factorization of the matrix A (see LQ Factorization).

In the general case you may have a rank-deficient least-squares problem, with rank(A)< min(m,n): find the minimum-norm least-squares solution that minimizes both ||x||2 and ||Ax - b||2.In this case (or when the rank of A is in doubt) you can use the QR factorization with pivoting or singularvalue decomposition (see Singular Value Decomposition).

Eigenvalue Problems. The eigenvalue problems (from German eigen “own”) are stated as follows:

given a matrix A, find the eigenvalues λ and the corresponding eigenvectors z that satisfy the equation

Az = λz (right eigenvectors z)

or the equation

zHA = λzH (left eigenvectors z).

If A is a real symmetric or complex Hermitian matrix, the above two equations are equivalent, and theproblem is called a symmetric eigenvalue problem. Routines for solving this type of problems are describedin the section Symmetric Eigenvalue Problems .

Routines for solving eigenvalue problems with nonsymmetric or non-Hermitian matrices are described inthe section Nonsymmetric Eigenvalue Problems .

557

The library also includes routines that handle generalized symmetric-definite eigenvalue

problems: find the eigenvalues λ and the corresponding eigenvectors x that satisfy one of thefollowing equations:

Az = λBz, ABz = λz, or BAz = λz,

where A is symmetric or Hermitian, and B is symmetric positive-definite or Hermitian positive-definite.Routines for reducing these problems to standard symmetric eigenvalue problems are described inthe section Generalized Symmetric-Definite Eigenvalue Problems .

To solve a particular problem, you usually call several computational routines. Sometimes you needto combine the routines of this chapter with other LAPACK routines described in Chapter 3 as wellas with BLAS routines described in Chapter 2 .

For example, to solve a set of least-squares problems minimizing ||Ax - b||2 for all columns b ofa given matrix B (where A and B are real matrices), you can call ?geqrf to form the factorization A= QR, then call ?ormqr to compute C = QHB and finally call the BLAS routine ?trsm to solve for Xthe system of equations RX = C.

Another way is to call an appropriate driver routine that performs several tasks in one call. Forexample, to solve the least-squares problem the driver routine ?gels can be used.

WARNING. LAPACK routines expect that input matrices do not contain INF or NaNvalues. When input data is inappropriate for LAPACK, problems may arise, includingpossible hangs.

Starting from release 8.0, Intel MKL along with Fortran-77 interface to LAPACK computational anddriver routines supports also Fortran-95 interface which uses simplified routine calls with shorterargument lists. The calling sequence for Fortran-95 interface is given in the syntax section of theroutine description immediately after Fortran-77 calls.

Routine Naming ConventionsFor each routine in this chapter, when calling it from the Fortran-77 program you can use theLAPACK name.

LAPACK names have the structure xyyzzz, which is described below.

The initial letter x indicates the data type:




558



The second and third letters yy indicate the matrix type and storage scheme:

bidiagonal matrixbd

general matrixge

general band matrixgb

upper Hessenberg matrixhs

(real) orthogonal matrixor

(real) orthogonal matrix (packed storage)op

(complex) unitary matrixun

(complex) unitary matrix (packed storage)up

symmetric or Hermitian positive-definite tridiagonal matrixpt

symmetric matrixsy

symmetric matrix (packed storage)sp

(real) symmetric band matrixsb

(real) symmetric tridiagonal matrixst

Hermitian matrixhe

Hermitian matrix (packed storage)hp

(complex) Hermitian band matrixhb

triangular or quasi-triangular matrix.tr

The last three letters zzz indicate the computation performed, for example:

form the QR factorizationqrf

form the LQ factorization.lqf

Thus, the routine sgeqrf forms the QR factorization of general real matrices in single precision;the corresponding routine for complex matrices is cgeqrf.

Names of the LAPACK computational and driver routines for Fortran-95 interface in Intel MKLare the same as Fortran-77 names but without the first letter that indicates the data type. Forexample, the name of the routine that forms the QR factorization of general real matrices inFortran-95 interface is geqrf. Handling of different data types is done through defining a specificinternal parameter referring to a module block with named constants for single and doubleprecision.

559

LAPACK Routines: Least Squares and Eigenvalue Problems 4

For details on the design of Fortran-95 interface for LAPACK computational and driver routinesin Intel MKL and for the general information on how the optional arguments are reconstructed,see Fortran-95 Interface Conventions in chapter 3 .

Matrix Storage SchemesLAPACK routines use the following matrix storage schemes:

• Full storage: a matrix A is stored in a two-dimensional array a, with the matrix elementaij stored in the array element a(i,j).


• Band storage: an m-by-n band matrix with kl sub-diagonals and ku super-diagonals isstored compactly in a two-dimensional array ab with kl+ku+1 rows and n columns. Columnsof the matrix are stored in the corresponding columns of the array, and diagonals of thematrix are stored in rows of the array.

In Chapters 3 and 4 , arrays that hold matrices in packed storage have names ending in p;arrays with matrices in band storage have names ending in b. For more information on matrixstorage schemes, see “Matrix Arguments” in Appendix B .

Mathematical NotationIn addition to the mathematical notation used in previous chapters, descriptions of routines inthis chapter use the following notation:

Eigenvalues of the matrix A (for the definition of eigenvalues, seeEigenvalue Problems).

λi

Singular values of the matrix A. They are equal to square roots ofthe eigenvalues of AHA. (For more information, see Singular ValueDecomposition).

σi

The 2-norm of the vector x: ||x||2 = (Σi|xi|2)1/2 = ||x||E .||x||2

The 2-norm (or spectral norm) of the matrix A.||A||2

||A||2 = maxiσi, ||A|| = max|x|=1(Ax·Ax).

560


The Euclidean norm of the matrix A: ||A|| = ΣiΣj|aij|2 (forvectors, the Euclidean norm and the 2-norm are equal: ||x||E =||x||2).

||A||E

The acute angle between vectors x and y:q(x, y)cos q(x, y) = |x·y| / (||x||2||y||2).

Computational RoutinesIn the sections that follow, the descriptions of LAPACK computational routines are given. Theseroutines perform distinct computational tasks that can be used for:

Orthogonal Factorizations

Singular Value Decomposition

Symmetric Eigenvalue Problems

Generalized Symmetric-Definite Eigenvalue Problems

Nonsymmetric Eigenvalue Problems

Generalized Nonsymmetric Eigenvalue Problems

Generalized Singular Value Decomposition

See also the respective driver routines.


This section describes the LAPACK routines for the QR (RQ) and LQ (QL) factorization ofmatrices. Routines for the RZ factorization as well as for generalized QR and RQ factorizationsare also included.

QR Factorization. Assume that A is an m-by-n matrix to be factored.

If m ≤ n, the QR factorization is given by

561


where R is an n-by-n upper triangular matrix with real diagonal elements, and Q is an m-by-morthogonal (or unitary) matrix.

You can use the QR factorization for solving the following least-squares problem: minimize ||Ax- b||2 where A is a full-rank m-by-n matrix(m < n). After factoring the matrix, compute thesolution x by solving Rx = (Q1)

Tb.

If m < n, the QR factorization is given by

A = QR = Q(R1R2)

where R is trapezoidal, R1 is upper triangular and R2 is rectangular.

The LAPACK routines do not form the matrix Q explicitly. Instead, Q is represented as a productof min(m, n) elementary reflectors. Routines are provided to work with Q in thisrepresentation.

LQ Factorization LQ factorization of an m-by-n matrix A is as follows. If m ≤ n,

where L is an m-by-m lower triangular matrix with real diagonal elements, and Q is an n-by-northogonal (or unitary) matrix.

If m > n, the LQ factorization is

where L1 is an n-by-n lower triangular matrix, L2 is rectangular, and Q is an n-by-n orthogonal(or unitary) matrix.

You can use the LQ factorization to find the minimum-norm solution of an underdeterminedsystem of linear equations Ax = b where A is an m-by-n matrix of rank m (m < n). Afterfactoring the matrix, compute the solution vector x as follows: solve Ly = b for y, and thencompute x = (Q1)

Hy.

Table 4-1 lists LAPACK routines (Fortran-77 interface) that perform orthogonal factorizationof matrices. Respective routine names in Fortran-95 interface are without the first symbol (seeRoutine Naming Conventions).

562

Table 4-1 Computational Routines for Orthogonal Factorization

Applymatrix Q

Generatematrix Q

Factorizewith pivoting

Factorizewithout pivoting

Matrix type, factorization

?ormqr?unmqr

?orgqr?ungqr

?geqpf?geqp3

?geqrfgeneral matrices, QRfactorization

?ormrq?unmrq

?orgrq?ungrq

?gerqfgeneral matrices, RQfactorization

?ormlq?unmlq

?orglq?unglq

?gelqfgeneral matrices, LQfactorization

?ormql?unmql

?orgql?ungql

?geqlfgeneral matrices, QLfactorization

?ormrz?unmrz

?tzrzftrapezoidal matrices, RZfactorization

?ggqrfpair of matrices, generalized QRfactorization

?ggrqfpair of matrices, generalized RQfactorization

?geqrfComputes the QR factorization of a general m-by-nmatrix.

Syntax

Fortran 77:

call sgeqrf(m, n, a, lda, tau, work, lwork, info)

call dgeqrf(m, n, a, lda, tau, work, lwork, info)

call cgeqrf(m, n, a, lda, tau, work, lwork, info)

call zgeqrf(m, n, a, lda, tau, work, lwork, info)

563


Fortran 95:

call geqrf(a [, tau] [,info])

Description

The routine forms the QR factorization of a general m-by-n matrix A (see OrthogonalFactorizations). No pivoting is performed.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product ofmin(m, n) elementary reflectors. Routines are provided to work with Q in this representation.

Input Parameters


INTEGER. The number of columns in A (n ≥ 0).n

REAL for sgeqrfa, workDOUBLE PRECISION for dgeqrfCOMPLEX for cgeqrfDOUBLE COMPLEX for zgeqrf.Arrays: a(lda,*) contains the matrix A. The seconddimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of a; at least max(1, m).lda


If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the factorization data as follows:a

If m ≥ n, the elements below the diagonal are overwrittenby the details of the unitary matrix Q, and the upper triangleis overwritten by the corresponding elements of the uppertriangular matrix R.

564


If m < n, the strictly lower triangular part is overwritten bythe details of the unitary matrix Q, and the remainingelements are overwritten by the corresponding elements ofthe m-by-n upper trapezoidal matrix R.

REAL for sgeqrftauDOUBLE PRECISION for dgeqrfCOMPLEX for cgeqrfDOUBLE COMPLEX for zgeqrf.Array, DIMENSION at least max (1, min(m, n)). Containsadditional information on the matrix Q.

If info = 0, on exit work(1) contains the minimum valueof lwork required for optimum performance. Use this lworkfor subsequent runs.

work(1)



Routines in Fortran 95 interface have fewer arguments in the calling sequence than their Fortran77 counterparts. For general conventions applied to skip redundant or restorable arguments,see Fortran-95 Interface Conventions.

Specific details for the routine geqrf interface are the following:


Holds the vector of length min(m,n)tau

Application Notes




565

The computed factorization is the exact factorization of a matrix A + E, where

||E||2 = O(ε)||A||2.


if m = n,(4/3)n3

if m > n,(2/3)n2(3m-n)

if m < n.(2/3)m2(3n-m)

The number of operations for complex flavors is 4 times greater.

To solve a set of least-squares problems minimizing ||Ax - b||2 for all columns b of a givenmatrix B, you can call the following:

to factorize A = QR;?geqrf (this routine)

to compute C = QT*B (for real matrices);?ormqr

to compute C = QH*B (for complex matrices);?unmqr

(a BLAS routine) to solve R*X = C.?trsm

(The columns of the computed X are the least-squares solution vectors x.)

To compute the elements of Q explicitly, call

(for real matrices)?orgqr

(for complex matrices).?ungqr

566

?geqpfComputes the QR factorization of a general m-by-nmatrix with pivoting.

Syntax

Fortran 77:

call sgeqpf(m, n, a, lda, jpvt, tau, work, info)

call dgeqpf(m, n, a, lda, jpvt, tau, work, info)

call cgeqpf(m, n, a, lda, jpvt, tau, work, rwork, info)

call zgeqpf(m, n, a, lda, jpvt, tau, work, rwork, info)

Fortran 95:

call geqpf(a, jpvt [,tau] [,info])

Description

This routine is deprecated and has been replaced by routine ?geqp3.

The routine ?geqpf forms the QR factorization of a general m-by-n matrix A with column pivoting:A*P = Q*R (see Orthogonal Factorizations). Here P denotes an n-by-n permutation matrix.


Input Parameters



REAL for sgeqpfa, workDOUBLE PRECISION for dgeqpfCOMPLEX for cgeqpfDOUBLE COMPLEX for zgeqpf.Arrays: a (lda,*) contains the matrix A. The seconddimension of a must be at least max(1, n).

567


work (lwork) is a workspace array. The size of the workarray must be at least max(1, 3*n) for real flavors and atleast max(1, n) for complex flavors.


INTEGER. Array, DIMENSION at least max(1, n).jpvtOn entry, if jpvt(i) > 0, the i-th column of A is movedto the beginning of A*P before the computation, and fixedin place during the computation.If jpvt(i) = 0, the ith column of A is a free column (thatis, it may be interchanged during the computation with anyother free column).

REAL for cgeqpfrworkDOUBLE PRECISION for zgeqpf.A workspace array, DIMENSION at least max(1, 2*n).

Output Parameters


If m ≥ n, the elements below the diagonal are overwrittenby the details of the unitary (orthogonal) matrix Q, and theupper triangle is overwritten by the corresponding elementsof the upper triangular matrix R.If m < n, the strictly lower triangular part is overwritten bythe details of the matrix Q, and the remaining elements areoverwritten by the corresponding elements of the m-by-nupper trapezoidal matrix R.

REAL for sgeqpftauDOUBLE PRECISION for dgeqpfCOMPLEX for cgeqpfDOUBLE COMPLEX for zgeqpf.Array, DIMENSION at least max (1, min(m, n)). Containsadditional information on the matrix Q.

Overwritten by details of the permutation matrix P in thefactorization A*P = Q*R. More precisely, the columns ofA*P are the columns of A in the following order:

jpvt

jpvt(1), jpvt(2), ..., jpvt(n).

INTEGER.info

568

If info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine geqpf interface are the following:


Holds the vector of length (n).jpvt


Application Notes


||E||2 = O(ε)||A||2.


if m = n,(4/3)n3

if m > n,(2/3)n2(3m-n)

if m < n.(2/3)m2(3n-m)



to factorize A*P = Q*R;?geqpf (this routine)



to solve R*X = C.?trsm (a BLASroutine)

(The columns of the computed X are the permuted least-squares solution vectors x; the outputarray jpvt specifies the permutation order.)


569



?geqp3Computes the QR factorization of a general m-by-nmatrix with column pivoting using Level 3 BLAS.

Syntax

Fortran 77:

call sgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)

call dgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)

call cgeqp3(m, n, a, lda, jpvt, tau, work, lwork, rwork, info)

call zgeqp3(m, n, a, lda, jpvt, tau, work, lwork, rwork, info)

Fortran 95:

call geqp3(a, jpvt [,tau] [,info])

Description

The routine forms the QR factorization of a general m-by-n matrix A with column pivoting: AP= Q*R (see Orthogonal Factorizations) using Level 3 BLAS. Here P denotes an n-by-n permutationmatrix. Use this routine instead of ?geqpf for better performance.


Input Parameters



REAL for sgeqp3a, workDOUBLE PRECISION for dgeqp3COMPLEX for cgeqp3DOUBLE COMPLEX for zgeqp3.Arrays:a (lda,*) contains the matrix A.

570


The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).


INTEGER. The size of the work array; must be at leastmax(1, 3*n+1) for real flavors, and at least max(1, n+1)for complex flavors.

lwork

If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes below for details.

INTEGER.jpvtArray, DIMENSION at least max(1, n).

On entry, if jpvt(i) ≠ 0, the ith column of A is moved tothe beginning of AP before the computation, and fixed inplace during the computation.If jpvt(i) = 0, the i-th column of A is a free column (thatis, it may be interchanged during the computation with anyother free column).

REAL for cgeqp3rworkDOUBLE PRECISION for zgeqp3.A workspace array, DIMENSION at least max(1, 2*n). Usedin complex flavors only.

Output Parameters


If m ≥ n, the elements below the diagonal are overwrittenby the details of the unitary (orthogonal) matrix Q, and theupper triangle is overwritten by the corresponding elementsof the upper triangular matrix R.If m < n, the strictly lower triangular part is overwritten bythe details of the matrix Q, and the remaining elements areoverwritten by the corresponding elements of the m-by-nupper trapezoidal matrix R.

REAL for sgeqp3tau

571

DOUBLE PRECISION for dgeqp3COMPLEX for cgeqp3DOUBLE COMPLEX for zgeqp3.Array, DIMENSION at least max (1, min(m, n)). Containsscalar factors of the elementary reflectors for the matrix Q.

Overwritten by details of the permutation matrix P in thefactorization AP = Q*R. More precisely, the columns of APare the columns of A in the following order:

jpvt

jpvt(1), jpvt(2), ..., jpvt(n).




Specific details for the routine geqp3 interface are the following:


Holds the vector of length (n).jpvt


Application Notes


to factorize AP = Q*R;?geqp3 (this routine)



to solve R*X = C.?trsm (a BLASroutine)

(The columns of the computed X are the permuted least-squares solution vectors x; the outputarray jpvt specifies the permutation order.)


572








?orgqrGenerates the real orthogonal matrix Q of the QRfactorization formed by ?geqrf.

Syntax

Fortran 77:

call sorgqr(m, n, k, a, lda, tau, work, lwork, info)

call dorgqr(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call orgqr(a, tau [,info])

Description

The routine generates the whole or part of m-by-m orthogonal matrix Q of the QR factorizationformed by the routines sgeqrf/dgeqrf or sgeqpf/dgeqpf. Use this routine after a call tosgeqrf/dgeqrf or sgeqpf/dgeqpf.

573


Usually Q is determined from the QR factorization of an m by p matrix A with m ≥ p. To computethe whole matrix Q, use:

call ?orgqr(m, m, p, a, lda, tau, work, lwork, info)

To compute the leading p columns of Q (which form an orthonormal basis in the space spannedby the columns of A):

call ?orgqr(m, p, p, a, lda, tau, work, lwork, info)

To compute the matrix Qk of the QR factorization of A's leading k columns:

call ?orgqr(m, m, k, a, lda, tau, work, lwork, info)

To compute the leading k columns of Qk (which form an orthonormal basis in the space spannedby A's leading k columns):

call ?orgqr(m, k, k, a, lda, tau, work, lwork, info)

Input Parameters

INTEGER. The order of the orthogonal matrix Q (m ≥ 0).m

INTEGER. The number of columns of Q to be computedn

(0 ≤ n ≤ m).

INTEGER. The number of elementary reflectors whose

product defines the matrix Q (0 ≤ k ≤ n).

k

REAL for sorgqra, tau, workDOUBLE PRECISION for dorgqrArrays:a(lda,*) and tau(*) are the arrays returned by sgeqrf /dgeqrf or sgeqpf / dgeqpf.The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).




574


Output Parameters

Overwritten by n leading columns of the m-by-m orthogonalmatrix Q.

a


work(1)




Specific details for the routine orgqr interface are the following:


Holds the vector of length (k)tau

Application Notes






575


The computed Q differs from an exactly orthogonal matrix by a matrix E such that

||E||2 = O(ε)||A||2 where ε is the machine precision.

The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 +(4/3)*k3.

If n = k, the number is approximately (2/3)*n2*(3m - n).

The complex counterpart of this routine is ?ungqr.

?ormqrMultiplies a real matrix by the orthogonal matrixQ of the QR factorization formed by ?geqrf or?geqpf.

Syntax

Fortran 77:

call sormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call dormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormqr(a, tau, c [,side] [,trans] [,info])

Description

The routine multiplies a real matrix C by Q or Q T, where Q is the orthogonal matrix Q of the QR

factorization formed by the routines sgeqrf/dgeqrf or sgeqpf/dgeqpf.

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QT*C, C*Q, or C*QT (overwriting the result on C).

Input Parameters

CHARACTER*1. Must be either 'L' or 'R'.sideIf side ='L', Q or QT is applied to C from the left.If side ='R', Q or QT is applied to C from the right.

CHARACTER*1. Must be either 'N' or 'T'.transIf trans ='N', the routine multiplies C by Q.

576


If trans ='T', the routine multiplies C by QT.

INTEGER. The number of rows in the matrix C (m ≥ 0).m

INTEGER. The number of columns in C (n ≥ 0).n

INTEGER. The number of elementary reflectors whoseproduct defines the matrix Q. Constraints:

k

0 ≤ k ≤ m if side ='L';

0 ≤ k ≤ n if side ='R'.

REAL for sgeqrfa, tau, c, workDOUBLE PRECISION for dgeqrf.Arrays:a(lda,*) and tau(*) are the arrays returned by sgeqrf /dgeqrf or sgeqpf / dgeqpf. The second dimension of amust be at least max(1, k). The dimension of tau must beat least max(1, k).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a. Constraints:lda

lda ≥ max(1, m) if side = 'L';

lda ≥ max(1, n) if side = 'R'.

INTEGER. The first dimension of c. Constraint:ldc

ldc ≥ max(1, m).

INTEGER. The size of the work array. Constraints:lwork

lwork ≥ max(1, n) if side = 'L';

lwork ≥ max(1, m) if side = 'R'.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

577


Output Parameters

Overwritten by the product Q*C, QT*C, C*Q, or C*QT (asspecified by side and trans).

c


work(1)




Specific details for the routine ormqr interface are the following:

Holds the matrix A of size (r,k).ar = m if side = 'L'.r = n if side = 'R'.

Holds the vector of length (k).tau



Must be 'N' or 'T'. The default value is 'N'.trans

Application Notes

For better performance, try using lwork = n*blocksize (if side = 'L') or lwork =m*blocksize (if side = 'R') where blocksize is a machine-dependent value (typically, 16to 64) required for optimum performance of the blocked algorithm.



578




The complex counterpart of this routine is ?unmqr.

?ungqrGenerates the complex unitary matrix Q of the QRfactorization formed by ?geqrf.

Syntax

Fortran 77:

call cungqr(m, n, k, a, lda, tau, work, lwork, info)

call zungqr(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call ungqr(a, tau [,info])

Description

The routine generates the whole or part of m-by-m unitary matrix Q of the QR factorization formedby the routines cgeqrf/zgeqrf or cgeqpf/zgeqpf. Use this routine after a call tocgeqrf/zgeqrf or cgeqpf/zgeqpf.

Usually Q is determined from the QR factorization of an m by p matrix A with m ≥ p. To computethe whole matrix Q, use:

call ?ungqr(m, m, p, a, lda, tau, work, lwork, info)

To compute the leading p columns of Q (which form an orthonormal basis in the space spannedby the columns of A):

call ?ungqr(m, p, p, a, lda, tau, work, lwork, info)

To compute the matrix Qk of the QR factorization of A's leading k columns:

call ?ungqr(m, m, k, a, lda, tau, work, lwork, info)

579


To compute the leading k columns of Qk (which form an orthonormal basis in the space spannedby A's leading k columns):

call ?ungqr(m, k, k, a, lda, tau, work, lwork, info)

Input Parameters

INTEGER. The order of the unitary matrix Q (m ≥ 0).m

INTEGER. The number of columns of Q to be computedn

(0 ≤ n ≤ m).


product defines the matrix Q (0 ≤ k ≤ n).

k

COMPLEX for cungqra, tau, workDOUBLE COMPLEX for zungqrArrays: a(lda,*) and tau(*) are the arrays returned bycgeqrf/zgeqrf or cgeqpf/zgeqpf.The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).




Output Parameters

Overwritten by n leading columns of the m-by-m unitarymatrix Q.

a


work(1)


580




Specific details for the routine ungqr interface are the following:



Application Notes

For better performance, try using lwork =n*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 =

O(ε)||A||2, where ε is the machine precision.

The total number of floating-point operations is approximately 16*m*n*k - 8*(m + n)*k2+ (16/3)*k3.

If n = k, the number is approximately (8/3)*n2*(3m - n).

The real counterpart of this routine is ?orgqr.

581


?unmqrMultiplies a complex matrix by the unitary matrixQ of the QR factorization formed by ?geqrf.

Syntax

Fortran 77:

call cunmqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call zunmqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call unmqr(a, tau, c [,side] [,trans] [,info])

Description

The routine multiplies a rectangular complex matrix C by Q or QH, where Q is the unitary matrixQ of the QR factorization formed by the routines cgeqrf/zgeqrf or cgeqpf/zgeqpf.

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QH*C, C*Q, or C*QH (overwriting the result on C).

Input Parameters

CHARACTER*1. Must be either 'L' or 'R'.sideIf side = 'L', Q or QH is applied to C from the left.If side = 'R', Q or QH is applied to C from the right.

CHARACTER*1. Must be either 'N' or 'C'.transIf trans = 'N', the routine multiplies C by Q.If trans = 'C', the routine multiplies C by QH.




k

0 ≤ k ≤ m if side = 'L';

0 ≤ k ≤ n if side = 'R'.

COMPLEX for cgeqrfa, c, tau, work

582


DOUBLE COMPLEX for zgeqrf.Arrays:a(lda,*) and tau(*) are the arrays returned by cgeqrf /zgeqrf or cgeqpf / zgeqpf.The second dimension of a must be at least max(1, k).The dimension of tau must be at least max(1, k).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1, lwork).


lda ≥ max(1, m) if side = 'L';

lda ≥ max(1, n) if side = 'R'.

INTEGER. The first dimension of c. Constraint:ldc

ldc ≥ max(1, m).



lwork ≥ max(1, m) if side = 'R'.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application notes for the suggested value of lwork.

Output Parameters

Overwritten by the product Q*C, QH*C, C*Q, or C*QH (asspecified by side and trans).

c


work(1)


583




Specific details for the routine unmqr interface are the following:






Application Notes






The real counterpart of this routine is ?ormqr.

584


?gelqfComputes the LQ factorization of a general m-by-nmatrix.

Syntax

Fortran 77:

call sgelqf(m, n, a, lda, tau, work, lwork, info)

call dgelqf(m, n, a, lda, tau, work, lwork, info)

call cgelqf(m, n, a, lda, tau, work, lwork, info)

call zgelqf(m, n, a, lda, tau, work, lwork, info)

Fortran 95:

call gelqf(a [, tau] [,info])

Description

The routine forms the LQ factorization of a general m-by-n matrix A (see OrthogonalFactorizations). No pivoting is performed.


Input Parameters



REAL for sgelqfa, workDOUBLE PRECISION for dgelqfCOMPLEX for cgelqfDOUBLE COMPLEX for zgelqf.Arrays:a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


585


INTEGER. The size of the work array; at least max(1, m).lworkIf lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters


If m ≤ n, the elements above the diagonal are overwrittenby the details of the unitary (orthogonal) matrix Q, and thelower triangle is overwritten by the corresponding elementsof the lower triangular matrix L.If m > n, the strictly upper triangular part is overwritten bythe details of the matrix Q, and the remaining elements areoverwritten by the corresponding elements of the m-by-nlower trapezoidal matrix L.

REAL for sgelqftauDOUBLE PRECISION for dgelqfCOMPLEX for cgelqfDOUBLE COMPLEX for zgelqf.Array, DIMENSION at least max(1, min(m, n)).Contains additional information on the matrix Q.


work(1)




Specific details for the routine gelqf interface are the following:


586


Holds the vector of length min(m,n).tau

Application Notes

For better performance, try using lwork =m*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.






||E||2 = O(ε) ||A||2.


if m = n,(4/3)n3

if m > n,(2/3)n2(3m-n)

if m < n.(2/3)m2(3n-m)


To find the minimum-norm solution of an underdetermined least-squares problem minimizing||Ax - b||2 for all columns b of a given matrix B, you can call the following:

to factorize A = L*Q;?gelqf (this routine)

to solve L*Y = B for Y;?trsm (a BLASroutine)

to compute X = (Q1)T*Y (for real matrices);?ormlq

to compute X = (Q1)H*Y (for complex matrices).?unmlq

587

(The columns of the computed X are the minimum-norm solution vectors x. Here A is an m-by-nmatrix with m < n; Q1 denotes the first m columns of Q).


(for real matrices)?orglq

(for complex matrices).?unglq

?orglqGenerates the real orthogonal matrix Q of the LQfactorization formed by ?gelqf.

Syntax

Fortran 77:

call sorglq(m, n, k, a, lda, tau, work, lwork, info)

call dorglq(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call orglq(a, tau [,info])

Description

The routine generates the whole or part of n-by-n orthogonal matrix Q of the LQ factorizationformed by the routines sgelqf/gelqf. Use this routine after a call to sgelqf/dgelqf.

Usually Q is determined from the LQ factorization of an p-by-n matrix A with n ≥ p. To computethe whole matrix Q, use:

call ?orglq(n, n, p, a, lda, tau, work, lwork, info)

To compute the leading p rows of Q, which form an orthonormal basis in the space spanned bythe rows of A, use:

call ?orglq(p, n, p, a, lda, tau, work, lwork, info)

To compute the matrix Qk of the LQ factorization of A's leading k rows, use:

call ?orglq(n, n, k, a, lda, tau, work, lwork, info)

To compute the leading k rows of Qk, which form an orthonormal basis in the space spannedby A's leading k rows, use:

call ?orgqr(k, n, k, a, lda, tau, work, lwork, info)

588

Input Parameters

INTEGER. The number of rows of Q to be computedm

(0 ≤ m ≤ n).

INTEGER. The order of the orthogonal matrix Q (n ≥ m).n


product defines the matrix Q (0 ≤ k ≤ m).

k

REAL for sorglqa, tau, workDOUBLE PRECISION for dorglqArrays: a(lda,*) and tau(*) are the arrays returned bysgelqf/dgelqf.The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1,lwork).



Output Parameters

Overwritten by m leading rows of the n-by-n orthogonalmatrix Q.

a


work(1)


589




Specific details for the routine orglq interface are the following:



Application Notes






The computed Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 =

O(ε) ||A||2, where e is the machine precision.


If m = k, the number is approximately (2/3)*m2*(3n - m).

The complex counterpart of this routine is ?unglq.

590


?ormlqMultiplies a real matrix by the orthogonal matrixQ of the LQ factorization formed by ?gelqf.

Syntax

Fortran 77:

call sormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call dormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormlq(a, tau, c [,side] [,trans] [,info])

Description

The routine multiplies a real m-by-n matrix C by Q or Q T, where Q is the orthogonal matrix Q ofthe LQ factorization formed by the routine sgelqf/dgelqf.

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QT*C, C*Q, or C*QT (overwriting the result on C).

Input Parameters

CHARACTER*1. Must be either 'L' or 'R'.sideIf side = 'L', Q or QT is applied to C from the left.If side = 'R', Q or QT is applied to C from the right.

CHARACTER*1. Must be either 'N' or 'T'.transIf trans = 'N', the routine multiplies C by Q.If trans = 'T', the routine multiplies C by QT.




k

0 ≤ k ≤ m if side = 'L';

0 ≤ k ≤ n if side = 'R'.

REAL for sormlqa, c, tau, work

591


DOUBLE PRECISION for dormlq.Arrays:a(lda,*) and tau(*) are arrays returned by ?gelqf.The second dimension of a must be:at least max(1, m) if side = 'L';at least max(1, n) if side = 'R'.The dimension of tau must be at least max(1, k).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a; lda ≥ max(1, k).lda

INTEGER. The first dimension of c; ldc ≥ max(1, m).ldc




Output Parameters


c


work(1)


592




Specific details for the routine ormlq interface are the following:

Holds the matrix A of size (k,m).a





Application Notes


If you are in doubt how much workspace to supply, use a generous value of lwork for the firstrun or set lwork= -1.


If you set lwork= -1, the routine returns immediately and provides the recommended workspacein the first element of the corresponding array (work). This operation is called a workspacequery.


The complex counterpart of this routine is ?unmlq.

593


?unglqGenerates the complex unitary matrix Q of the LQfactorization formed by ?gelqf.

Syntax

Fortran 77:

call cunglq(m, n, k, a, lda, tau, work, lwork, info)

call zunglq(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call unglq(a, tau [,info])

Description

The routine generates the whole or part of n-by-n unitary matrix Q of the LQ factorization formedby the routines cgelqf/zgelqf. Use this routine after a call to cgelqf/zgelqf.

Usually Q is determined from the LQ factorization of an p-by-n matrix A with n < p. To computethe whole matrix Q, use:

call ?unglq(n, n, p, a, lda, tau, work, lwork, info)

To compute the leading p rows of Q, which form an orthonormal basis in the space spanned bythe rows of A, use:

call ?unglq(p, n, p, a, lda, tau, work, lwork, info)

To compute the matrix Qk of the LQ factorization of A's leading k rows, use:

call ?unglq(n, n, k, a, lda, tau, work, lwork, info)

To compute the leading k rows of Qk, which form an orthonormal basis in the space spannedby A's leading k rows, use:

call ?ungqr(k, n, k, a, lda, tau, work, lwork, info)

Input Parameters

INTEGER. The number of rows of Q to be computed (0 ≤ m

≤ n).

m

INTEGER. The order of the unitary matrix Q (n ≥ m).n

594


product defines the matrix Q (0 ≤ k ≤ m).

k

COMPLEX for cunglqa, tau, workDOUBLE COMPLEX for zunglqArrays: a(lda,*) and tau(*) are the arrays returned bysgelqf/dgelqf.The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).



Output Parameters

Overwritten by m leading rows of the n-by-n unitary matrixQ.

a


work(1)




Specific details for the routine unglq interface are the following:



595


Application Notes

For better performance, try using lwork = m*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)||A||2, where e is the machine precision.


If m = k, the number is approximately (8/3)*m2*(3n - m) .

The real counterpart of this routine is ?orglq.

?unmlqMultiplies a complex matrix by the unitary matrixQ of the LQ factorization formed by ?gelqf.

Syntax

Fortran 77:

call cunmlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call zunmlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

596


Fortran 95:

call unmlq(a, tau, c [,side] [,trans] [,info])

Description

The routine multiplies a real m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of theLQ factorization formed by the routine cgelqf/zgelqf.


Input Parameters






k

0 ≤ k ≤ m if side = 'L';

0 ≤ k ≤ n if side = 'R'.

COMPLEX for cunmlqa, c, tau, workDOUBLE COMPLEX for zunmlq.Arrays:a(lda,*) and tau(*) are arrays returned by ?gelqf.The second dimension of a must be:at least max(1, m) if side = 'L';at least max(1, n) if side = 'R'.The dimension of tau must be at least max(1, k).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1, lwork).

597







Output Parameters


c


work(1)




Specific details for the routine unmlq interface are the following:






598


Application Notes






The real counterpart of this routine is ?ormlq.

?geqlfComputes the QL factorization of a general m-by-nmatrix.

Syntax

Fortran 77:

call sgeqlf(m, n, a, lda, tau, work, lwork, info)

call dgeqlf(m, n, a, lda, tau, work, lwork, info)

call cgeqlf(m, n, a, lda, tau, work, lwork, info)

call zgeqlf(m, n, a, lda, tau, work, lwork, info)

Fortran 95:

call geqlf(a [, tau] [,info])

599


Description

The routine forms the QL factorization of a general m-by-n matrix A. No pivoting is performed.


Input Parameters



REAL for sgeqlfa, workDOUBLE PRECISION for dgeqlfCOMPLEX for cgeqlfDOUBLE COMPLEX for zgeqlf.Arrays: a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER. The size of the work array; at least max(1, n).lworkIf lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten on exit by the factorization data as follows:aif m < n, the lower triangle of the subarray a(m-n+1:m, 1:n)

contains the n-by-n lower triangular matrix L; if m ≤ n, theelements on and below the (n-m)-th superdiagonal containthe m-by-n lower trapezoidal matrix L; in both cases, theremaining elements, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors.

REAL for sgeqlftauDOUBLE PRECISION for dgeqlf

600

COMPLEX for cgeqlfDOUBLE COMPLEX for zgeqlf.Array, DIMENSION at least max(1, min(m, n)). Containsscalar factors of the elementary reflectors for the matrix Q.

If info = 0, on exit work(1) contains the minimum valueof lwork required for optimum performance.

work(1)




Specific details for the routine geqlf interface are the following:



Application Notes






601


Related routines include:

to generate matrix Q (for real matrices);?orgql

to generate matrix Q (for complex matrices);?ungql

to apply matrix Q (for real matrices);?ormql

to apply matrix Q (for complex matrices).?unmql

?orgqlGenerates the real matrix Q of the QL factorizationformed by ?geqlf.

Syntax

Fortran 77:

call sorgql(m, n, k, a, lda, tau, work, lwork, info)

call dorgql(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call orgql(a, tau [,info])

Description

The routine generates an m-by-n real matrix Q with orthonormal columns, which is defined asthe last n columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...*H(2)*H(1) as returned by the routines sgeqlf/dgeqlf. Use this routine after a call tosgeqlf/dgeqlf.

Input Parameters

INTEGER. The number of rows of the matrix Q (m≥ 0).m

INTEGER. The number of columns of the matrix Q (m≥ n≥0).

n


product defines the matrix Q (n≥ k≥ 0).

k

REAL for sorgqla, tau, workDOUBLE PRECISION for dorgql

602


Arrays: a(lda,*), tau(*).On entry, the (n - k + i)th column of a must contain thevector which defines the elementary reflector H(i), for i =1,2,...,k, as returned by sgeqlf/dgeqlf in the last kcolumns of its array argument a; tau(i) must contain thescalar factor of the elementary reflector H(i), as returnedby sgeqlf/dgeqlf;The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1,lwork).



Output Parameters

Overwritten by the m-by-n matrix Q.a


work(1)




Specific details for the routine orgql interface are the following:



603


Application Notes






The complex counterpart of this routine is ?ungql.

?ungqlGenerates the complex matrix Q of the QLfactorization formed by ?geqlf.

Syntax

Fortran 77:

call cungql(m, n, k, a, lda, tau, work, lwork, info)

call zungql(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call ungql(a, tau [,info])

604


Description

The routine generates an m-by-n complex matrix Q with orthonormal columns, which is definedas the last n columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...*H(2)*H(1) as returned by the routines cgeqlf/zgeqlf . Use this routine after a call tocgeqlf/zgeqlf.

Input Parameters


INTEGER. The number of columns of the matrix Q (m≥n≥ 0).n


product defines the matrix Q (n≥ k≥ 0).

k

COMPLEX for cungqla, tau, workDOUBLE COMPLEX for zungqlArrays: a(lda,*), tau(*), work(lwork).On entry, the (n - k + i)th column of a must contain thevector which defines the elementary reflector H(i), for i =1,2,...,k, as returned by cgeqlf/zgeqlf in the last kcolumns of its array argument a;tau(i) must contain the scalar factor of theelementaryreflector H(i), as returned by cgeqlf/zgeqlf;The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).



Output Parameters


605



work(1)




Specific details for the routine ungql interface are the following:



Application Notes






The real counterpart of this routine is ?orgql.

606


?ormqlMultiplies a real matrix by the orthogonal matrixQ of the QL factorization formed by ?geqlf.

Syntax

Fortran 77:

call sormql(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call dormql(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormql(a, tau, c [,side] [,trans] [,info])

Description

This routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q ofthe QL factorization formed by the routine sgeqlf/dgeqlf .

Depending on the parameters side and trans, the routine ?ormql can form one of the matrixproducts Q*C, QT*C, C*Q, or C*QT (overwriting the result over C).

Input Parameters



INTEGER. The number of rows in the matrix C (m≥ 0).m

INTEGER. The number of columns in C (n≥ 0).n


k

0 ≤k≤m if side = 'L';

0 ≤k≤n if side = 'R'.

REAL for sormqla, tau, c, work

607


DOUBLE PRECISION for dormql.Arrays: a(lda,*), tau(*), c(ldc,*).On entry, the ith column of a must contain the vector whichdefines the elementary reflector Hi, for i = 1,2,...,k, asreturned by sgeqlf/dgeqlf in the last k columns of itsarray argument a.The second dimension of a must be at least max(1, k).tau(i) must contain the scalar factor of the elementaryreflector Hi, as returned by sgeqlf/dgeqlf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a;lda

if side = 'L', lda≥ max(1, m);

if side = 'R', lda≥ max(1, n).

INTEGER. The first dimension of c; ldc≥ max(1, m).ldc


lwork≥ max(1, n) if side = 'L';

lwork≥ max(1, m) if side = 'R'.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters


c


work(1)

INTEGER.info

608


If info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.



Specific details for the routine ormql interface are the following:






Application Notes






The complex counterpart of this routine is ?unmql.

609


?unmqlMultiplies a complex matrix by the unitary matrixQ of the QL factorization formed by ?geqlf.

Syntax

Fortran 77:

call cunmql(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call zunmql(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call unmql(a, tau, c [,side] [,trans] [,info])

Description

The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix Q ofthe QL factorization formed by the routine cgeqlf/zgeqlf .

Depending on the parameters side and trans, the routine ?unmql can form one of the matrixproducts Q*C, QH*C, C*Q, or C*QH (overwriting the result over C).

Input Parameters






k

0 ≤ k ≤ m if side = 'L';

0 ≤ k ≤ n if side = 'R'.

COMPLEX for cunmqla, tau, c, work

610


DOUBLE COMPLEX for zunmql.Arrays: a(lda,*), tau(*), c(ldc,*), work(lwork).On entry, the i-th column of a must contain the vectorwhich defines the elementary reflector H(i), for i = 1,2,...,k,as returned by cgeqlf/zgeqlf in the last k columns of itsarray argument a.The second dimension of a must be at least max(1, k).tau(i) must contain the scalar factor of the elementaryreflectorH(i), as returned by cgeqlf/zgeqlf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1, lwork).






Output Parameters


c


work(1)


611




Specific details for the routine unmql interface are the following:




Must be 'L' or 'L'. The default value is 'L'.side


Application Notes






The real counterpart of this routine is ?ormql.

612


?gerqfComputes the RQ factorization of a general m-by-nmatrix.

Syntax

Fortran 77:

call sgerqf(m, n, a, lda, tau, work, lwork, info)

call dgerqf(m, n, a, lda, tau, work, lwork, info)

call cgerqf(m, n, a, lda, tau, work, lwork, info)

call zgerqf(m, n, a, lda, tau, work, lwork, info)

Fortran 95:

call gerqf(a [, tau] [,info])

Description

The routine forms the RQ factorization of a general m-by-n matrix A. No pivoting is performed.


Input Parameters



REAL for sgerqfa, workDOUBLE PRECISION for dgerqfCOMPLEX for cgerqfDOUBLE COMPLEX for zgerqf.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER. The size of the work array;lwork

613


lwork ≥ max(1, m).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten on exit by the factorization data as follows:a

if m ≤ n, the upper triangle of the subarraya(1:m, n-m+1:n ) contains the m-by-m upper triangular matrixR;

if m ≥ n, the elements on and above the (m-n)th subdiagonalcontain the m-by-n upper trapezoidal matrix R;in both cases, the remaining elements, with the array tau,represent the orthogonal/unitary matrix Q as a product ofmin(m,n) elementary reflectors.

REAL for sgerqftauDOUBLE PRECISION for dgerqfCOMPLEX for cgerqfDOUBLE COMPLEX for zgerqf.Array, DIMENSION at least max (1, min(m, n)).Contains scalar factors of the elementary reflectors for thematrix Q.

If info = 0, on exit work(1) contains the minimum valueof lwork required for optimum performance.

work(1)




Specific details for the routine gerqf interface are the following:

614




Application Notes







to generate matrix Q (for real matrices);?orgrq

to generate matrix Q (for complex matrices);?ungrq

to apply matrix Q (for real matrices);?ormrq

to apply matrix Q (for complex matrices).?unmrq

615


?orgrqGenerates the real matrix Q of the RQ factorizationformed by ?gerqf.

Syntax

Fortran 77:

call sorgrq(m, n, k, a, lda, tau, work, lwork, info)

call dorgrq(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call orgrq(a, tau [,info])

Description

The routine generates an m-by-n real matrix Q with orthonormal rows, which is defined as thelast m rows of a product of k elementary reflectors H(i) of order n: Q = H(1)*H(2)*...*H(k)as returned by the routines sgerqf/dgerqf. Use this routine after a call tosgerqf/dgerqf.

Input Parameters


INTEGER. The number of columns of the matrix Q (n≥ m).n


product defines the matrix Q (m≥ k≥ 0).

k

REAL for sorgrqa, tau, workDOUBLE PRECISION for dorgrqArrays: a(lda,*), tau(*).On entry, the (m - k + i)th row of a must contain the vectorwhich defines the elementary reflector H(i), for i = 1,2,...,k,as returned by sgerqf/dgerqf in the last k rows of its arrayargument a;tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by sgerqf/dgerqf;The second dimension of a must be at least max(1, n).

616


The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).



Output Parameters



work(1)




Specific details for the routine orgrq interface are the following:



Application Notes



617





The complex counterpart of this routine is ?ungrq.

?ungrqGenerates the complex matrix Q of the RQfactorization formed by ?gerqf.

Syntax

Fortran 77:

call cungrq(m, n, k, a, lda, tau, work, lwork, info)

call zungrq(m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call ungrq(a, tau [,info])

Description

The routine generates an m-by-n complex matrix Q with orthonormal rows, which is defined asthe last m rows of a product of k elementary reflectors H(i) of order n: Q = H(1)H*H(2)H*...*H(k)H as returned by the routines sgerqf/dgerqf. Use this routine after a call tosgerqf/dgerqf.

Input Parameters


INTEGER. The number of columns of the matrix Q (n≥ m ).n

618



product defines the matrix Q (m≥ k≥ 0).

k

REAL for cungrqa, tau, workDOUBLE PRECISION for zungrqArrays: a(lda,*), tau(*), work(lwork).On entry, the (m - k + i)th row of a must contain the vectorwhich defines the elementary reflector H(i), for i = 1,2,...,k,as returned by sgerqf/dgerqf in the last k rows of its arrayargument a;tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by sgerqf/dgerqf;The second dimension of a must be at least max(1, n).The dimension of tau must be at least max(1, k).work is a workspace array, its dimension max(1, lwork).



Output Parameters



work(1)




619


Specific details for the routine ungrq interface are the following:



Application Notes






The real counterpart of this routine is ?orgrq.

?ormrqMultiplies a real matrix by the orthogonal matrixQ of the RQ factorization formed by ?gerqf.

Syntax

Fortran 77:

call sormrq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call dormrq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormrq(a, tau, c [,side] [,trans] [,info])

620


Description

The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrixdefined as a product of k elementary reflectors Hi : Q = H1 H2 ... Hk as returned by the RQ

factorization routine sgerqf/dgerqf .

Depending on the parameters side and trans, the routine can form one of the matrix productsQC, QTC, CQ, or CQT (overwriting the result over C).

Input Parameters






k

0 ≤ k ≤ m, if side = 'L';

0 ≤ k ≤ n, if side = 'R'.

REAL for sormrqa, tau, c, workDOUBLE PRECISION for dormrq.Arrays: a(lda,*), tau(*), c(ldc,*).On entry, the ith row of a must contain the vector whichdefines the elementary reflector Hi, for i = 1,2,...,k, asreturned by sgerqf/dgerqf in the last k rows of its arrayargument a.The second dimension of a must be at least max(1, m) ifside = 'L', and at least max(1, n) if side = 'R'.tau(i) must contain the scalar factor of the elementaryreflector Hi, as returned by sgerqf/dgerqf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)

621


work is a workspace array, its dimension max(1, lwork).






Output Parameters

Overwritten by the product QC, QTC, CQ, or CQT (as specifiedby side and trans).

c


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.



Specific details for the routine ormrq interface are the following:






622


Application Notes






The complex counterpart of this routine is ?unmrq.

?unmrqMultiplies a complex matrix by the unitary matrixQ of the RQ factorization formed by ?gerqf.

Syntax

Fortran 77:

call cunmrq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

call zunmrq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call unmrq(a, tau, c [,side] [,trans] [,info])

623


Description

The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the complex unitarymatrix defined as a product of k elementary reflectors H(i) of order n: Q = H(1)H*H(2)H*...*H(k)Has returned by the RQ factorization routine cgerqf/zgerqf .

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QH*C, C*Q, or C*QH (overwriting the result over C).

Input Parameters






k

0 ≤ k ≤ m, if side = 'L';

0 ≤ k ≤ n, if side = 'R'.

COMPLEX for cunmrqa, tau, c, workDOUBLE COMPLEX for zunmrq.Arrays: a(lda,*), tau(*), c(ldc,*), work(lwork).On entry, the ith row of a must contain the vector whichdefines the elementary reflector H(i), for i = 1,2,...,k, asreturned by cgerqf/zgerqf in the last k rows of its arrayargument a.The second dimension of a must be at least max(1, m) ifside = 'L', and at least max(1, n) if side = 'R'.tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by cgerqf/zgerqf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)

624


work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a; lda ≥ max(1, k) .lda





Output Parameters


c


work(1)




Specific details for the routine unmrq interface are the following:






625


Application Notes






The real counterpart of this routine is ?ormrq.

?tzrzfReduces the upper trapezoidal matrix A to uppertriangular form.

Syntax

Fortran 77:

call stzrzf(m, n, a, lda, tau, work, lwork, info)

call dtzrzf(m, n, a, lda, tau, work, lwork, info)

call ctzrzf(m, n, a, lda, tau, work, lwork, info)

call ztzrzf(m, n, a, lda, tau, work, lwork, info)

Fortran 95:

call tzrzf(a [, tau] [,info])

626


Description

This routine reduces the m-by-n (m ≤ n) real/complex upper trapezoidal matrix A to uppertriangular form by means of orthogonal/unitary transformations. The upper trapezoidal matrixA is factored as

A = ( R 0 ) * Z,

where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.

See ?larz that applies an elementary reflector returned by ?tzrzf to a general matrix.

Input Parameters


INTEGER. The number of columns in A (n ≥ m).n

REAL for stzrzfa, workDOUBLE PRECISION for dtzrzfCOMPLEX for ctzrzfDOUBLE COMPLEX for ztzrzf.Arrays: a(lda,*), work(lwork).The leading m-by-n uppertrapezoidal part of the array a contains the matrix A to befactorized.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



lwork ≥ max(1, m).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten on exit by the factorization data as follows:a

627


the leading m-by-m upper triangular part of a contains theupper triangular matrix R, and elements m +1 to n of thefirst m rows of a, with the array tau, represent theorthogonal matrix Z as a product of m elementary reflectors.

REAL for stzrzftauDOUBLE PRECISION for dtzrzfCOMPLEX for ctzrzfDOUBLE COMPLEX for ztzrzf.Array, DIMENSION at least max (1, m). Contains scalarfactors of the elementary reflectors for the matrix Z.


work(1)




Specific details for the routine tzrzf interface are the following:


Holds the vector of length (m).tau

Application Notes




628





to apply matrix Q (for real matrices)?ormrz

to apply matrix Q (for complex matrices).?unmrz

?ormrzMultiplies a real matrix by the orthogonal matrixdefined from the factorization formed by ?tzrzf.

Syntax

Fortran 77:

call sormrz(side, trans, m, n, k, l, a, lda, tau, c, ldc, work, lwork, info)

call dormrz(side, trans, m, n, k, l, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormrz(a, tau, c, l [, side] [,trans] [,info])

Description

The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrixdefined as a product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k)as returned by the factorization routine stzrzf/dtzrzf .

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QT*C, C*Q, or C*QT (overwriting the result over C).

The matrix Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

CHARACTER*1. Must be either 'L' or 'R'.sideIf side = 'L', Q or QT is applied to C from the left.

629


If side = 'R', Q or QT is applied to C from the right.





k

0 ≤ k ≤ m, if side = 'L';

0 ≤ k ≤ n, if side = 'R'.

INTEGER.lThe number of columns of the matrix A containing themeaningful part of the Householder reflectors. Constraints:

0 ≤ l ≤ m, if side = 'L';

0 ≤ l ≤ n, if side = 'R'.

REAL for sormrza, tau, c, workDOUBLE PRECISION for dormrz.Arrays: a(lda,*), tau(*), c(ldc,*).On entry, the ith row of a must contain the vector whichdefines the elementary reflector H(i), for i = 1,2,...,k, asreturned by stzrzf/dtzrzf in the last k rows of its arrayargument a.The second dimension of a must be at least max(1, m) ifside = 'L', and at least max(1, n) if side = 'R'.tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by stzrzf/dtzrzf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of a; lda ≥ max(1, k) .lda



630




Output Parameters

Overwritten by the product Q*C, QT*C, C*Q, or C*QTc(as specified by side and trans).


work(1)




Specific details for the routine ormrz interface are the following:






Application Notes


631






The complex counterpart of this routine is ?unmrz.

?unmrzMultiplies a complex matrix by the unitary matrixdefined from the factorization formed by ?tzrzf.

Syntax

Fortran 77:

call cunmrz(side, trans, m, n, k, l, a, lda, tau, c, ldc, work, lwork, info)

call zunmrz(side, trans, m, n, k, l, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call unmrz(a, tau, c, l [,side] [,trans] [,info])

Description

The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrixdefined as a product of k elementary reflectorsH(i):

Q = H(1)H* H(2)H*...*H(k)H as returned by the factorization routine ctzrzf/ztzrzf .

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QH*C, C*Q, or C*QH (overwriting the result over C).

The matrix Q is of order m if side = 'L' and of order n if side = 'R'.

632


Input Parameters






k

0 ≤ k ≤ m, if side = 'L';

0 ≤ k ≤ n, if side = 'R'.

INTEGER.lThe number of columns of the matrix A containing themeaningful part of the Householder reflectors. Constraints:

0 ≤ l ≤ m, if side = 'L';

0 ≤ l ≤ n, if side = 'R'.

COMPLEX for cunmrza, tau, c, workDOUBLE COMPLEX for zunmrz.Arrays: a(lda,*), tau(*), c(ldc,*), work(lwork).On entry, the ith row of a must contain the vector whichdefines the elementary reflector H(i), for i = 1,2,...,k, asreturned by ctzrzf/ztzrzf in the last k rows of its arrayargument a.The second dimension of a must be at least max(1, m) ifside = 'L', and at least max(1, n) if side = 'R'.tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ctzrzf/ztzrzf.The dimension of tau must be at least max(1, k).c(ldc,*) contains the m-by-n matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

633







Output Parameters


c


work(1)




Specific details for the routine unmrz interface are the following:






634


Application Notes






The real counterpart of this routine is ?ormrz.

?ggqrfComputes the generalized QR factorization of twomatrices.

Syntax

Fortran 77:

call sggqrf(n, m, p, a, lda, taua, b, ldb, taub, work, lwork, info)

call dggqrf(n, m, p, a, lda, taua, b, ldb, taub, work, lwork, info)

call cggqrf(n, m, p, a, lda, taua, b, ldb, taub, work, lwork, info)

call zggqrf(n, m, p, a, lda, taua, b, ldb, taub, work, lwork, info)

Fortran 95:

call ggqrf(a, b [,taua] [,taub] [,info])

635


Description

The routine forms the generalized QR factorization of an n-by-m matrix A and an n-by-p matrixB as A = Q*R, B = Q*T*Z, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-porthogonal/unitary matrix, and R and T assume one of the forms:

or

where R11 is upper triangular, and

where T12 or T21 is a p-by-p upper triangular matrix.

In particular, if B is square and nonsingular, the GQR factorization of A and B implicitly gives theQR factorization of B-1A as:

B-1*A = ZH*(T-1*R)

636


Input Parameters

INTEGER. The number of rows of the matrices A and B (n

≥ 0).

n

INTEGER. The number of columns in A (m ≥ 0).m

INTEGER. The number of columns in B (p ≥ 0).p

REAL for sggqrfa, b, workDOUBLE PRECISION for dggqrfCOMPLEX for cggqrfDOUBLE COMPLEX for zggqrf.Arrays: a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, m).b(ldb,*) contains the matrix B.The second dimension of b must be at least max(1, p).work is a workspace array, its dimension max(1, lwork).


INTEGER. The first dimension of b; at least max(1, n).ldb

INTEGER. The size of the work array; must be at leastmax(1, n, m, p).

lwork


Output Parameters

Overwritten by the factorization data as follows:a, bon exit, the elements on and above the diagonal of the arraya contain the min(n,m)-by-m upper trapezoidal matrix R (R

is upper triangular if n ≥ m);the elements below thediagonal, with the array taua, represent theorthogonal/unitary matrix Q as a product of min(n,m)elementary reflectors ;

if n ≤ p, the upper triangle of the subarray b(1:n, p-n+1:p) contains the n-by-n upper triangular matrix T;

637


if n > p, the elements on and above the (n-p)th subdiagonalcontain the n-by-p upper trapezoidal matrix T; the remainingelements, with the array taub, represent theorthogonal/unitary matrix Z as a product of elementaryreflectors.

REAL for sggqrftaua, taubDOUBLE PRECISION for dggqrfCOMPLEX for cggqrfDOUBLE COMPLEX for zggqrf.Arrays, DIMENSION at least max (1, min(n, m)) for taua andat least max (1, min(n, p)) for taub. The array taua containsthe scalar factors of the elementary reflectors whichrepresent the orthogonal/unitary matrix Q.The array taub contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Z.


work(1)




Specific details for the routine ggqrf interface are the following:

Holds the matrix A of size (n,m).a

Holds the matrix B of size (n,p).b

Holds the vector of length min(n,m).taua

Holds the vector of length min(n,p).taub

638


Application Notes

For better performance, try using lwork ≥ max(n,m, p)*max(nb1,nb2,nb3), where nb1 isthe optimal blocksize for the QR factorization of an n-by-m matrix, nb2 is the optimal blocksizefor the RQ factorization of an n-by-p matrix, and nb3 is the optimal blocksize for a call of?ormqr/?unmqr.





?ggrqfComputes the generalized RQ factorization of twomatrices.

Syntax

Fortran 77:

call sggrqf (m, p, n, a, lda, taua, b, ldb, taub, work, lwork, info)

call dggrqf (m, p, n, a, lda, taua, b, ldb, taub, work, lwork, info)

call cggrqf (m, p, n, a, lda, taua, b, ldb, taub, work, lwork, info)

call zggrqf (m, p, n, a, lda, taua, b, ldb, taub, work, lwork, info)

Fortran 95:

call ggrqf(a, b [,taua] [,taub] [,info])

639


Description

The routine forms the generalized RQ factorization of an m-by-n matrix A and an p-by-n matrixB as A = R*Q, B = Z*T*Q, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-porthogonal/unitary matrix, and R and T assume one of the forms:

or

where R11 or R21 is upper triangular, and

or

where T11 is upper triangular.

640


In particular, if B is square and nonsingular, the GRQ factorization of A and B implicitly gives theRQ factorization of A*B-1 as:

A*B-1 = (R*T-1)*ZH

Input Parameters

INTEGER. The number of rows of the matrix A (m ≥ 0).m

INTEGER. The number of rows in B (p ≥ 0).p

INTEGER. The number of columns of the matrices A and B

(n ≥ 0).

n

REAL for sggrqfa, b, workDOUBLE PRECISION for dggrqfCOMPLEX for cggrqfDOUBLE COMPLEX for zggrqf.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the p-by-n matrix B.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER. The first dimension of b; at least max(1, p).ldb

INTEGER. The size of the work array; must be at leastmax(1, n, m, p).

lwork


Output Parameters

Overwritten by the factorization data as follows:a, bon exit, if m ≤ n, the upper triangle of the subarray a(1:m,n-m+1:n ) contains the m-by-m upper triangular matrix R;

641


if m > n, the elements on and above the (m-n)th subdiagonalcontain the m-by-n upper trapezoidal matrix R;the remaining elements, with the array taua, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors; the elements on and above the diagonal of thearray b contain the min(p,n)-by-n upper trapezoidal matrix

T (T is upper triangular if p ≥ n); the elements below thediagonal, with the array taub, represent theorthogonal/unitary matrix Z as a product of elementaryreflectors.

REAL for sggrqftaua, taubDOUBLE PRECISION for dggrqfCOMPLEX for cggrqfDOUBLE COMPLEX for zggrqf.Arrays, DIMENSION at least max (1, min(m, n)) for taua andat least max (1, min(p, n)) for taub.The array taua contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Q.The array taub contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Z.


work(1)




Specific details for the routine ggrqf interface are the following:


Holds the matrix A of size (p,n).b

Holds the vector of length min(m,n).taua

642


Holds the vector of length min(p,n).taub

Application Notes

For better performance, try using

lwork ≥ max(n,m, p)*max(nb1,nb2,nb3),

where nb1 is the optimal blocksize for the RQ factorization of an m-by-n matrix, nb2 is the optimalblocksize for the QR factorization of an p-by-n matrix, and nb3 is the optimal blocksize for a callof ?ormrq/?unmrq.

If you are in doubt how much workspace to supply, use a generous value of lwork for the firstrun or set lwork= -1.


If you set lwork= -1, the routine returns immediately and provides the recommended workspacein the first element of the corresponding array (work). This operation is called a workspacequery.



This section describes LAPACK routines for computing the singular value decomposition(SVD) of a general m-by-n matrix A:

A = UΣVH.

In this decomposition, U and V are unitary (for complex A) or orthogonal (for real A); Σ is an

m-by-n diagonal matrix with real diagonal elements σi:

σ1 < σ2 < ... < σmin(m, n) < 0.

The diagonal elements σi are singular values of A. The first min(m, n) columns of thematrices U and V are, respectively, left and right singular vectors of A. The singularvalues and singular vectors satisfy

643

Avi = σiui and AHui = σivi

where ui and vi are the i th columns of U and V, respectively.

To find the SVD of a general matrix A, call the LAPACK routine ?gebrd or ?gbbrd for reducingA to a bidiagonal matrix B by a unitary (orthogonal) transformation: A = QBPH. Then call ?bdsqr,

which forms the SVD of a bidiagonal matrix: B = U1ΣV1H.

Thus, the sought-for SVD of A is given by A = UΣVH =(QU1)Σ(V1HPH).

Table 4-2 lists LAPACK routines (Fortran-77 interface) that perform singular value decompositionof matrices. Respective routine names in Fortran-95 interface are without the first symbol (seeRoutine Naming Conventions).

Table 4-2 Computational Routines for Singular Value Decomposition (SVD)

Complex matricesReal matricesOperation

?gebrd?gebrdReduce A to a bidiagonal matrix B: A =QBPH (full storage)

?gbbrd?gbbrdReduce A to a bidiagonal matrix B: A =QBPH (band storage)

?ungbr?orgbrGenerate the orthogonal (unitary) matrixQ or P

?unmbr?ormbrApply the orthogonal (unitary) matrix Q orP

?bdsqr?bdsqr ?bdsdcForm singular value decomposition of the

bidiagonal matrix B: B = UΣVH

644


Figure 4-1 Decision Tree: Singular Value Decomposition

Figure 4-1 “Decision Tree: Singular Value Decomposition” presents a decision tree that helpsyou choose the right sequence of routines for SVD, depending on whether you need singularvalues only or singular vectors as well, whether A is real or complex, and so on.

645


You can use the SVD to find a minimum-norm solution to a (possibly) rank-deficient least-squaresproblem of minimizing ||Ax - b||2. The effective rank k of the matrix A can be determinedas the number of singular values which exceed a suitable threshold. The minimum-norm solutionis

x = Vk(Σk)-1c,

where Σk is the leading k -by- k submatrix of Σ, the matrix Vk consists of the first k columns ofV = PV1, and the vector c consists of the first k elements of UHb = U1

HQHb.

?gebrdReduces a general matrix to bidiagonal form.

Syntax

Fortran 77:

call sgebrd(m, n, a, lda, d, e, tauq, taup, work, lwork, info)

call dgebrd(m, n, a, lda, d, e, tauq, taup, work, lwork, info)

call cgebrd(m, n, a, lda, d, e, tauq, taup, work, lwork, info)

call zgebrd(m, n, a, lda, d, e, tauq, taup, work, lwork, info)

Fortran 95:

call gebrd(a [, d] [,e] [,tauq] [,taup] [,info])

Description

The routine reduces a general m-by-n matrix A to a bidiagonal matrix B by an orthogonal (unitary)transformation.

If m ≥ n, the reduction is given by

where B1 is an n-by-n upper diagonal matrix, Q and P are orthogonal or, for a complex A, unitarymatrices; Q1 consists of the first n columns of Q.

If m < n, the reduction is given by

A = Q*B*PH = Q*(B10)*PH = Q1*B1*P1

H,

646

where B1 is an m-by-m lower diagonal matrix, Q and P are orthogonal or, for a complex A, unitarymatrices; P1 consists of the first m rows of P.

The routine does not form the matrices Q and P explicitly, but represents them as products ofelementary reflectors. Routines are provided to work with the matrices Q and P in thisrepresentation:

If the matrix A is real,

• to compute Q and P explicitly, call ?orgbr.

• to multiply a general matrix by Q or P, call ?ormbr.

If the matrix A is complex,

• to compute Q and P explicitly, call ?ungbr.

• to multiply a general matrix by Q or P, call ?unmbr.

Input Parameters



REAL for sgebrda, workDOUBLE PRECISION for dgebrdCOMPLEX for cgebrdDOUBLE COMPLEX for zgebrd.Arrays:a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER.lworkThe dimension of work; at least max(1, m, n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

647


Output Parameters

If m ≥ n, the diagonal and first super-diagonal of a areoverwritten by the upper bidiagonal matrix B. Elementsbelow the diagonal are overwritten by details of Q, and theremaining elements are overwritten by details of P.

a

If m < n, the diagonal and first sub-diagonal of a areoverwritten by the lower bidiagonal matrix B. Elementsabove the diagonal are overwritten by details of P, and theremaining elements are overwritten by details of Q.

REAL for single-precision flavorsdDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, min(m, n)).Contains the diagonal elements of B.

REAL for single-precision flavorseDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, min(m, n) - 1).Contains the off-diagonal elements of B.

REAL for sgebrdtauq, taupDOUBLE PRECISION for dgebrdCOMPLEX for cgebrdDOUBLE COMPLEX for zgebrd.Arrays, DIMENSION at least max (1, min(m, n)). Containfurther details of the matrices Q and P.


work(1)




Specific details for the routine gebrd interface are the following:

648

Holds the vector of length min(m,n).d

Holds the vector of length min(m,n)-1.e

Holds the vector of length min(m,n).tauq

Holds the vector of length min(m,n).taup

Application Notes

For better performance, try using lwork = (m + n)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





The computed matrices Q, B, and P satisfy QBPH = A + E, where ||E||2 = c(n)ε ||A||2,

c(n) is a modestly increasing function of n, and ε is the machine precision.


(4/3)*n2*(3*m - n) for m ≥ n,

(4/3)*m2*(3*n - m) for m < n.

The number of operations for complex flavors is four times greater.

If n is much less than m, it can be more efficient to first form the QR factorization of A by calling?geqrf and then reduce the factor R to bidiagonal form. This requires approximately 2*n2*(m+ n) floating-point operations.

649

If m is much less than n, it can be more efficient to first form the LQ factorization of A by calling?gelqf and then reduce the factor L to bidiagonal form. This requires approximately 2*m2*(m+ n) floating-point operations.

?gbbrdReduces a general band matrix to bidiagonal form.

Syntax

Fortran 77:

call sgbbrd(vect, m, n, ncc, kl, ku, ab, ldab, d, e, q, ldq, pt, ldpt, c, ldc,work, info)

call dgbbrd(vect, m, n, ncc, kl, ku, ab, ldab, d, e, q, ldq, pt, ldpt, c, ldc,work, info)

call cgbbrd(vect, m, n, ncc, kl, ku, ab, ldab, d, e, q, ldq, pt, ldpt, c, ldc,work, rwork, info)

call zgbbrd(vect, m, n, ncc, kl, ku, ab, ldab, d, e, q, ldq, pt, ldpt, c, ldc,work, rwork, info)

Fortran 95:

call gbbrd(a [, c] [,d] [,e] [,q] [,pt] [,kl] [,m] [,info])

Description

This routine reduces an m-by-n band matrix A to upper bidiagonal matrix B: A = Q*B*PH. Herethe matrices Q and P are orthogonal (for real A) or unitary (for complex A). They are determinedas products of Givens rotation matrices, and may be formed explicitly by the routine if required.The routine can also update a matrix C as follows: C = QH*C.

Input Parameters

CHARACTER*1. Must be 'N' or 'Q' or 'P' or 'B'.vectIf vect = 'N', neither Q nor PH is generated.If vect = 'Q', the routine generates the matrix Q.If vect = 'P', the routine generates the matrix PH.If vect = 'B', the routine generates both Q and PH.


650



INTEGER. The number of columns in C (ncc ≥ 0).ncc

INTEGER. The number of sub-diagonals within the band of

A (kl ≥ 0).

kl

INTEGER. The number of super-diagonals within the band

of A (ku ≥ 0).

ku

REAL for sgbbrdab, c, workDOUBLE PRECISION for dgbbrd COMPLEX for cgbbrdDOUBLE COMPLEX for zgbbrd.Arrays:ab(ldab,*) contains the matrix A in band storage (see MatrixStorage Schemes).The second dimension of a must be at least max(1, n).c(ldc,*) contains an m-by-ncc matrix C.If ncc = 0, the array c is not referenced.The second dimension of c must be at least max(1, ncc).work(*) is a workspace array.The dimension of work must be at least 2*max(m, n) forreal flavors, or max(m, n) for complex flavors.

INTEGER. The first dimension of the array ab (ldab ≥ kl+ ku + 1).

ldab

INTEGER. The first dimension of the output array q.ldq

ldq ≥ max(1, m) if vect = 'Q' or 'B', ldq ≥ 1otherwise.

INTEGER. The first dimension of the output array pt.ldpt

ldpt ≥ max(1, n) if vect = 'P' or 'B', ldpt ≥ 1otherwise.

INTEGER. The first dimension of the array c.ldc

ldc ≥ max(1, m) if ncc > 0; ldc ≥ 1 if ncc = 0.

REAL for cgbbrd DOUBLE PRECISION for zgbbrd.rworkA workspace array, DIMENSION at least max(m, n).

651


Output Parameters

Overwritten by values generated during the reduction.ab

REAL for single-precision flavorsdDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, min(m, n)). Contains thediagonal elements of the matrix B.

REAL for single-precision flavorseDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, min(m, n) - 1).Contains the off-diagonal elements of B.

REAL for sgebrdq, ptDOUBLE PRECISION for dgebrdCOMPLEX for cgebrdDOUBLE COMPLEX for zgebrd.Arrays:q(ldq,*) contains the output m-by-m matrix Q.The second dimension of q must be at least max(1, m).p(ldpt,*) contains the output n-by-n matrix PT.The second dimension of pt must be at least max(1, n).




Specific details for the routine gbbrd interface are the following:

Stands for argument ab in Fortan 77 interface . Holds the array A ofsize (kl+ku+1,n).

a

Holds the matrix C of size (m,ncc).c


Holds the vector of length min(m,n)-1.e

Holds the matrix Q of size (m,m).q

652


Holds the matrix PT of size (n,n).pt

If omitted, assumed m = n.m



Restored based on the presence of arguments q and pt as follows:vectvect = 'B', if both q and pt are present,vect = 'Q', if q is present and pt omitted, vect = 'P',if q is omitted and pt present, vect = 'N', if both q and pt areomitted.

Application Notes

The computed matrices Q, B, and P satisfy Q*B*PH = A + E, where ||E||2 = c(n)ε ||A||2,


If m = n, the total number of floating-point operations for real flavors is approximately thesum of:

6*n2*(kl + ku) if vect = 'N' and ncc = 0,

3*n2*ncc*(kl + ku - 1)/(kl + ku) if C is updated, and

3*n3*(kl + ku - 1)/(kl + ku) if either Q or PH is generated (double this if both).

To estimate the number of operations for complex flavors, use the same formulas with thecoefficients 20 and 10 (instead of 6 and 3).

?orgbrGenerates the real orthogonal matrix Q or PT

determined by ?gebrd.

Syntax

Fortran 77:

call sorgbr(vect, m, n, k, a, lda, tau, work, lwork, info)

call dorgbr(vect, m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call orgbr(a, tau [,vect] [,info])

653


Description

The routine generates the whole or part of the orthogonal matrices Q and PT formed by theroutines sgebrd/dgebrd. Use this routine after a call to sgebrd/dgebrd. All valid combinationsof arguments are described in Input parameters. In most cases you need the following:

To compute the whole m-by-m matrix Q:

call ?orgbr('Q', m, m, n, a ... )

(note that the array a must have at least m columns).

To form the n leading columns of Q if m > n:

call ?orgbr('Q', m, n, n, a ... )

To compute the whole n-by-n matrix PT:

call ?orgbr('P', n, n, m, a ... )

(note that the array a must have at least n rows).

To form the m leading rows of PT if m < n:

call ?orgbr('P', m, n, m, a ... )

Input Parameters

CHARACTER*1. Must be 'Q' or 'P'.vectIf vect = 'Q', the routine generates the matrix Q.If vect = 'P', the routine generates the matrix PT.

INTEGER. The number of required rows of Q or PT.m

INTEGER. The number of required columns of Q or PT.n

INTEGER. One of the dimensions of A in ?gebrd:kIf vect = 'Q', the number of columns in A;If vect = 'P', the number of rows in A.

Constraints: m ≥ 0, n ≥ 0, k ≥ 0.

For vect = 'Q': k ≤ n ≤ m if m > k, or m = n if m ≤ k.

For vect = 'P': k ≤ m ≤ n if n > k, or m = n if n ≤ k.

REAL for sorgbra, workDOUBLE PRECISION for dorgbr.Arrays:a(lda,*) is the array a as returned by ?gebrd.The second dimension of a must be at least max(1, n).

654

work is a workspace array, its dimension max(1, lwork).


REAL for sorgbrtauDOUBLE PRECISION for dorgbr.For vect = 'Q', the array tauq as returned by ?gebrd.For vect = 'P', the array taup as returned by ?gebrd.The dimension of tau must be at least max(1, min(m, k))for vect = 'Q', or max(1, min(m, k)) for vect = 'P'.

INTEGER. The size of the work array.lworkIf lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the orthogonal matrix Q or PT (or the leadingrows or columns thereof) as specified by vect, m, and n.

a


work(1)




Specific details for the routine orgbr interface are the following:


Holds the vector of length min(m,k) wheretauk = m, if vect = 'P',k = n, if vect = 'Q'.

Must be 'Q' or 'P'. The default value is 'Q'.vect

655


Application Notes

For better performance, try using lwork = min(m,n)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2

= O(ε).

The approximate numbers of floating-point operations for the cases listed in Description areas follows:

To form the whole of Q:

(4/3)n(3m2 - 3m*n + n2) if m > n;

(4/3)m3 if m ≤ n.

To form the n leading columns of Q when m > n:

(2/3)n2(3m - n2) if m > n.

To form the whole of PT:

(4/3)n3 if m ≥ n;

(4/3)m(3n2 - 3m*n + m2) if m < n.

To form the m leading columns of PT when m < n:

(2/3)n2(3m - n2) if m > n.

656

The complex counterpart of this routine is ?ungbr.

?ormbrMultiplies an arbitrary real matrix by the realorthogonal matrix Q or PT determined by ?gebrd.

Syntax

Fortran 77:

call sormbr(vect, side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork,info)

call dormbr(vect, side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork,info)

Fortran 95:

call ormbr(a, tau, c [,vect] [,side] [,trans] [,info])

Description

Given an arbitrary real matrix C, this routine forms one of the matrix products Q*C, QT*C, C*Q,C*QT, P*C, PT*C, C*P, or C*PT, where Q and P are orthogonal matrices computed by a call tosgebrd/dgebrd. The routine overwrites the product on C.

Input Parameters

In the descriptions below, r denotes the order of Q or PT:

If side = 'L', r = m; if side = 'R', r = n.

CHARACTER*1. Must be 'Q' or 'P'.vectIf vect = 'Q', then Q or QT is applied to C.If vect = 'P', then P or PT is applied to C.

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L', multipliers are applied to C from the left.If side = 'R', they are applied to C from the right.

CHARACTER*1. Must be 'N' or 'T'.transIf trans = 'N', then Q or P is applied to C.If trans = 'T', then QT or PT is applied to C.

INTEGER. The number of rows in C.m

657


INTEGER. The number of columns in C.n



REAL for sormbra, c, workDOUBLE PRECISION for dormbr.Arrays:a(lda,*) is the array a as returned by ?gebrd.Its second dimension must be at least max(1, min(r,k)) forvect = 'Q', or max(1, r)) for vect = 'P'.c(ldc,*) holds the matrix C.Its second dimension must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda ≥ max(1, r) if vect = 'Q';

lda ≥ max(1, min(r,k)) if vect = 'P'.


REAL for sormbrtauDOUBLE PRECISION for dormbr.Array, DIMENSION at least max (1, min(r, k)).For vect = 'Q', the array tauq as returned by ?gebrd.For vect = 'P', the array taup as returned by ?gebrd.




658


Output Parameters

Overwritten by the product Q*C, QT*C, C*Q, C*QT, P*C, PT*C,C*P, or C*PT, as specified by vect, side, and trans.

c


work(1)




Specific details for the routine ormbr interface are the following:

Holds the matrix A of size (r,min(nq,k)) wherear = nq, if vect = 'Q',r = min(nq,k), if vect = 'P',nq = m, if side = 'L',nq = n, if side = 'R',k = m, if vect = 'P',k = n, if vect = 'Q'.

Holds the vector of length min(nq,k).tau





Application Notes


lwork = n*blocksize for side = 'L', or

lwork = m*blocksize for side = 'R',

659


where blocksize is a machine-dependent value (typically, 16 to 64) required for optimumperformance of the blocked algorithm.





The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)||C||2.

The total number of floating-point operations is approximately

2*n*k(2*m - k) if side = 'L' and m ≥ k;

2*m*k(2*n - k) if side = 'R' and n ≥ k;

2*m2*n if side = 'L' and m < k;

2*n2*m if side = 'R' and n < k.

The complex counterpart of this routine is ?unmbr.

660

?ungbrGenerates the complex unitary matrix Q or PHdetermined by ?gebrd.

Syntax

Fortran 77:

call cungbr(vect, m, n, k, a, lda, tau, work, lwork, info)

call zungbr(vect, m, n, k, a, lda, tau, work, lwork, info)

Fortran 95:

call ungbr(a, tau [,vect] [,info])

Description

The routine generates the whole or part of the unitary matrices Q and PH formed by the routinescgebrd/zgebrd. Use this routine after a call to cgebrd/zgebrd. All valid combinations ofarguments are described in Input Parameters;in most cases you need the following:

To compute the whole m-by-m matrix Q, use:

call ?ungbr('Q', m, m, n, a ... )

(note that the array a must have at least m columns).

To form the n leading columns of Q if m > n, use:

call ?ungbr('Q', m, n, n, a ... )

To compute the whole n-by-n matrix PH, use:

call ?ungbr('P', n, n, m, a ... )

(note that the array a must have at least n rows).

To form the m leading rows of PH if m < n, use:

call ?ungbr('P', m, n, m, a ... )

Input Parameters

CHARACTER*1. Must be 'Q' or 'P'.vectIf vect = 'Q', the routine generates the matrix Q.If vect = 'P', the routine generates the matrix PH.

661

INTEGER. The number of required rows of Q or PH.m

INTEGER. The number of required columns of Q or PH.n



For vect = 'Q': k ≤ n ≤ m if m > k, or m = n if m ≤ k.

For vect = 'P': k ≤ m ≤ n if n > k, or m = n if n ≤ k.

COMPLEX for cungbra, workDOUBLE COMPLEX for zungbr.Arrays:a(lda,*) is the array a as returned by ?gebrd.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).


COMPLEX for cungbrtauDOUBLE COMPLEX for zungbr.For vect = 'Q', the array tauq as returned by ?gebrd.For vect = 'P', the array taup as returned by ?gebrd.The dimension of tau must be at least max(1, min(m, k))for vect = 'Q', or max(1, min(m, k)) for vect = 'P'.

INTEGER. The size of the work array.lworkConstraint: lwork < max(1, min(m, n)).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the orthogonal matrix Q or PT (or the leadingrows or columns thereof) as specified by vect, m, and n.

a

662


work(1)




Specific details for the routine ungbr interface are the following:


Holds the vector of length min(m,k) wheretauk = m, if vect = 'P',k = n, if vect = 'Q'.


Application Notes

For better performance, try using lwork = min(m,n)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





663


= O(ε).

The approximate numbers of floating-point operations for the cases listed in Description areas follows:

To form the whole of Q:

(16/3)n(3m2 - 3m*n + n2) if m > n;

(16/3)m3 if m ≤ n.

To form the n leading columns of Q when m > n:

(8/3)n2(3m - n2) if m > n.

To form the whole of PT:

(16/3)n3 if m ≥ n;

(16/3)m(3n2 - 3m*n + m2) if m < n.

To form the m leading columns of PT when m < n:

(8/3)n2(3m - n2) if m > n.

The real counterpart of this routine is ?orgbr.

?unmbrMultiplies an arbitrary complex matrix by theunitary matrix Q or P determined by ?gebrd.

Syntax

Fortran 77:

call cunmbr(vect, side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork,info)

call zunmbr(vect, side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork,info)

Fortran 95:

call unmbr(a, tau, c [,vect] [,side] [,trans] [,info])

664

Description

Given an arbitrary complex matrix C, this routine forms one of the matrix products Q*C, QH*C,C*Q, C*QH, P*C, PH*C, C*P, or C*PH, where Q and P are orthogonal matrices computed by a callto cgebrd/zgebrd. The routine overwrites the product on C.

Input Parameters

In the descriptions below, r denotes the order of Q or PH:


CHARACTER*1. Must be 'Q' or 'P'.vectIf vect = 'Q', then Q or QH is applied to C.If vect = 'P', then P or PH is applied to C.

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L', multipliers are applied to C from the left.If side = 'R', they are applied to C from the right.

CHARACTER*1. Must be 'N' or 'C'.transIf trans = 'N', then Q or P is applied to C.If trans = 'C', then QH or PH is applied to C.

INTEGER. The number of rows in C.m

INTEGER. The number of columns in C.n



COMPLEX for cunmbra, c, workDOUBLE COMPLEX for zunmbr.Arrays:a(lda,*) is the array a as returned by ?gebrd.Its second dimension must be at least max(1, min(r,k)) forvect = 'Q', or max(1, r)) for vect = 'P'.c(ldc,*) holds the matrix C.Its second dimension must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).


665


lda ≥ max(1, r) if vect = 'Q';

lda ≥ max(1, min(r,k)) if vect = 'P'.


COMPLEX for cunmbrtauDOUBLE COMPLEX for zunmbr.Array, DIMENSION at least max (1, min(r, k)).For vect = 'Q', the array tauq as returned by ?gebrd.For vect = 'P', the array taup as returned by ?gebrd.

INTEGER. The size of the work array.lwork


lwork ≥ max(1, m) if side = 'R'.

lwork ≥ 1 if n=0 or m=0.

For optimum performance lwork ≥ max(1,n*nb) if side

= 'L', and lwork ≥ max(1,m*nb) if side = 'R', wherenb is the optimal blocksize. (nb = 0 if m = 0 or n =0.)If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the product Q*C, QH*C, C*Q, C*QH, P*C, PH*C,C*P, or C*PH, as specified by vect, side, and trans.

c


work(1)


666




Specific details for the routine unmbr interface are the following:

Holds the matrix A of size (r,min(nq,k)) wherear = nq, if vect = 'Q',r = min(nq,k), if vect = 'P',nq = m, if side = 'L',nq = n, if side = 'R',k = m, if vect = 'P',k = n, if vect = 'Q'.

Holds the vector of length min(nq,k).tau





Application Notes


lwork = n*blocksize for side = 'L', or

lwork = m*blocksize for side = 'R',

where blocksize is a machine-dependent value (typically, 16 to 64) required for optimumperformance of the blocked algorithm.




667


The total number of floating-point operations is approximately

8*n*k(2*m - k) if side = 'L' and m ≥ k;

8*m*k(2*n - k) if side = 'R' and n ≥ k;

8*m2*n if side = 'L' and m < k;

8*n2*m if side = 'R' and n < k.

The real counterpart of this routine is ?ormbr.

?bdsqrComputes the singular value decomposition of ageneral matrix that has been reduced to bidiagonalform.

Syntax

Fortran 77:

call sbdsqr(uplo, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc, work,info)

call dbdsqr(uplo, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc, work,info)

call cbdsqr(uplo, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc, work,info)

call zbdsqr(uplo, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc, work,info)

Fortran 95:

call rbdsqr(d, e [,vt] [,u] [,c] [,uplo] [,info])

call bdsqr(d, e [,vt] [,u] [,c] [,uplo] [,info])

668

Description

This routine computes the singular values and, optionally, the right and/or left singular vectorsfrom the Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonalmatrix B using the implicit zero-shift QR algorithm. The SVD of B has the form B = Q*S*PH

where S is the diagonal matrix of singular values, Q is an orthogonal matrix of left singularvectors, and P is an orthogonal matrix of right singular vectors. If left singular vectors arerequested, this subroutine actually returns U *Q instead of Q, and, if right singular vectors arerequested, this subroutine returns PH *VT instead of PH, for given real/complex input matricesU and VT. When U and VT are the orthogonal/unitary matrices that reduce a general matrix Ato bidiagonal form: A = U*B*VT, as computed by ?gebrd, then

A = (U*Q)*S*(PH*VT)

is the SVD of A. Optionally, the subroutine may also compute QH *C for a given real/complexinput matrix C.

See also ?lasq1, ?lasq2, ?lasq3, ?lasq4, ?lasq5, ?lasq6 used by this routine.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', B is an upper bidiagonal matrix.If uplo = 'L', B is a lower bidiagonal matrix.

INTEGER. The order of the matrix B (n ≥ 0).n

INTEGER. The number of columns of the matrix VT, that is,

the number of right singular vectors (ncvt ≥ 0).

ncvt

Set ncvt = 0 if no right singular vectors are required.

INTEGER. The number of rows in U, that is, the number of

left singular vectors (nru ≥ 0).

nru

Set nru = 0 if no left singular vectors are required.

INTEGER. The number of columns in the matrix C used for

computing the product QH*C (ncc ≥ 0). Set ncc = 0 if nomatrix C is supplied.

ncc

REAL for single-precision flavorsd, e, workDOUBLE PRECISION for double-precision flavors.Arrays:d(*) contains the diagonal elements of B.

669


The dimension of d must be at least max(1, n).e(*) contains the (n-1) off-diagonal elements of B.The dimension of e must be at least max(1, n). e(n) isused for workspace.work(*) is a workspace array.The dimension of work must be at least max(1, 2*n) ifncvt = nru = ncc = 0;max(1, 4*(n-1)) otherwise.

REAL for sbdsqrvt, u, cDOUBLE PRECISION for dbdsqrCOMPLEX for cbdsqrDOUBLE COMPLEX for zbdsqr.Arrays:vt(ldvt,*) contains an n-by-ncvt matrix VT.The second dimension of vt must be at least max(1, ncvt).vt is not referenced if ncvt = 0.u(ldu,*) contains an nru by n unit matrix U.The second dimension of u must be at least max(1, n).u is not referenced if nru = 0.c(ldc,*) contains the matrix C for computing the productQH*C.The second dimension of c must be at least max(1, ncc).The array is not referenced if ncc = 0.

INTEGER. The first dimension of vt. Constraints:ldvt

ldvt ≥ max(1, n) if ncvt > 0;

ldvt ≥ 1 if ncvt = 0.

INTEGER. The first dimension of u. Constraint:ldu

ldu ≥ max(1, nru).

INTEGER. The first dimension of c. Constraints:ldc

ldc ≥ max(1, n) if ncc > 0;ldc ≥ 1 otherwise.

Output Parameters

On exit, if info = 0, overwritten by the singular values indecreasing order (see info).

d

On exit, if info = 0, e is destroyed. See also info below.e

670


Overwritten by the product QH*C.c

On exit, this array is overwritten by PH *VT.vt

On exit, this array is overwritten by U *Q .u

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the algorithm failed to converge; i specifieshow many off-diagonals did not converge.In this case, d and e contain on exit the diagonal andoff-diagonal elements, respectively, of a bidiagonal matrixorthogonally equivalent to B.



Specific details for the routine bdsqr interface are the following:


Holds the vector of length (n).e

Holds the matrix VT of size (n, ncvt).vt

Holds the matrix U of size (nru,n).u

Holds the matrix C of size (n,ncc).c


If argument vt is present, then ncvt is equal to the number of columnsin matrix VT; otherwise, ncvt is set to zero.

ncvt

If argument u is present, then nru is equal to the number of rows inmatrix U; otherwise, nru is set to zero.

nru

If argument c is present, then ncc is equal to the number of columnsin matrix C; otherwise, ncc is set to zero.

ncc

Note that two variants of Fortran 95 interface for bdsqr routine are needed because of anambiguous choice between real and complex cases appear when vt, u, and c are omitted. Thus,the name rbdsqr is used in real cases (single or double precision), and the name bdsqr isused in complex cases (single or double precision).

671


Application Notes

Each singular value and singular vector is computed to high relative accuracy. However, thereduction to bidiagonal form (prior to calling the routine) may decrease the relative accuracyin the small singular values of the original matrix if its singular values vary widely in magnitude.

If si is an exact singular value of B, and si is the corresponding computed value, then

|si - σi| ≤ p*(m,n)*ε*σi

where p(m, n) is a modestly increasing function of m and n, and ≤ is the machine precision.

If only singular values are computed, they are computed more accurately than when somesingular vectors are also computed (that is, the function p(m, n) is smaller).

If ui is the corresponding exact left singular vector of B, and wi is the corresponding computed

left singular vector, then the angle θ(ui, wi) between them is bounded as follows:

θ(ui, wi) ≤ p(m,n)*ε / min i≠j(|σi - σj|/|σi + σj|).

Here mini≠j(|σi - σj|/|σi + σj|) is the relative gap between σi and the other singular

values. A similar error bound holds for the right singular vectors.

The total number of real floating-point operations is roughly proportional to n2 if only the singularvalues are computed. About 6n2*nru additional operations (12n2*nru for complex flavors) arerequired to compute the left singular vectors and about 6n2*ncvt operations (12n2*ncvt forcomplex flavors) to compute the right singular vectors.

?bdsdcComputes the singular value decomposition of areal bidiagonal matrix using a divide and conquermethod.

Syntax

Fortran 77:

call sbdsdc(uplo, compq, n, d, e, u, ldu, vt, ldvt, q, iq, work, iwork, info)

call dbdsdc(uplo, compq, n, d, e, u, ldu, vt, ldvt, q, iq, work, iwork, info)

Fortran 95:

call bdsdc(d, e [,u] [,vt] [,q] [,iq] [,uplo] [,info])

672


Description

This routine computes the Singular Value Decomposition (SVD) of a real n-by-n (upper or lower)

bidiagonal matrix B: B = U*Σ*VT, using a divide and conquer method, where Σ is a diagonalmatrix with non-negative diagonal elements (the singular values of B), and U and V areorthogonal matrices of left and right singular vectors, respectively. ?bdsdc can be used tocompute all singular values, and optionally, singular vectors or singular vectors in compactform.

This rotuine uses ?lasd0, ?lasd1, ?lasd2, ?lasd3, ?lasd4, ?lasd5, ?lasd6, ?lasd7, ?lasd8,?lasd9, ?lasda, ?lasdq, ?lasdt.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', B is an upper bidiagonal matrix.If uplo = 'L', B is a lower bidiagonal matrix.

CHARACTER*1. Must be 'N', 'P', or 'I'.compqIf compq = 'N', compute singular values only.If compq = 'P', compute singular values and computesingular vectors in compact form.If compq = 'I', compute singular values and singularvectors.


REAL for sbdsdcd, e, workDOUBLE PRECISION for dbdsdc.Arrays:d(*) contains the n diagonal elements of the bidiagonalmatrix B.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of the bidiagonalmatrix B.The dimension of e must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least:max(1, 4*n), if compq = 'N';max(1, 6*n), if compq = 'P';max(1, 3*n2+4*n), if compq = 'I'.

673


INTEGER. The first dimension of the output array u; ldu ≥1.

ldu

If singular vectors are desired, then ldu ≥ max(1, n).

INTEGER. The first dimension of the output array vt; ldvt

≥ 1.

ldvt

If singular vectors are desired, then ldvt ≥ max(1, n).

INTEGER. Workspace array, dimension at least max(1, 8*n).iwork

Output Parameters

If info = 0, overwritten by the singular values of B.d

On exit, e is overwritten.e

REAL for sbdsdcu, vt, qDOUBLE PRECISION for dbdsdc.Arrays: u(ldu,*), vt(ldvt,*), q(*).If compq = 'I', then on exit u contains the left singular

vectors of the bidiagonal matrix B, unless info ≠ 0(seeinfo). For other values of compq, u is not referenced.The second dimension of u must be at least max(1,n).if compq = 'I', then on exit vt contains the right singular

vectors of the bidiagonal matrix B, unless info ≠ 0(seeinfo). For other values of compq, vt is not referenced.The second dimension of vt must be at least max(1,n).If compq = 'P', then on exit, if info = 0, q and iq containthe left and right singular vectors in a compact form.Specifically, q contains all the REAL (for sbdsdc) or DOUBLEPRECISION (for dbdsdc) data for singular vectors. For othervalues of compq, q is not referenced. See Application notesfor details.

INTEGER.iqArray: iq(*).If compq = 'P', then on exit, if info = 0, q and iq containthe left and right singular vectors in a compact form.Specifically, iq contains all the INTEGER data for singularvectors. For other values of compq, iq is not referenced.See Application notes for details.

674


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the algorithm failed to compute a singularvalue. The update process of divide and conquer failed.



Specific details for the routine bdsdc interface are the following:



Holds the matrix U of size (n,n).u

Holds the matrix VT of size (n,n).vt

Holds the vector of length (ldq), whereq

ldq ≥ n*(11 + 2*smlsiz + 8*int(log_2(n/(smlsiz + 1))))and smlsiz is returned by ilaenv and is equal to the maximum sizeof the subproblems at the bottom of the computation tree (usuallyabout 25).

Restored based on the presence of arguments u, vt, q, and iq asfollows:

compq

compq = 'N', if none of u, vt, q, and iq are present,compq = 'I', if both u and vt are present. Arguments u and vt musteither be both present or both omitted,compq = 'P', if both q and iq are present. Arguments q and iq musteither be both present or both omitted.Note that there will be an error condition if all of u, vt, q, and iqarguments are present simultaneously.

Symmetric Eigenvalue Problems

Symmetric eigenvalue problems are posed as follows: given an n-by-n real symmetric or

complex Hermitian matrix A, find the eigenvaluesλ and the corresponding eigenvectorszthat satisfy the equation

675


Az = λz. (or, equivalently, zHA = λzH).

In such eigenvalue problems, all n eigenvalues are real not only for real symmetric but also forcomplex Hermitian matrices A, and there exists an orthonormal system of n eigenvectors. If Ais a symmetric or Hermitian positive-definite matrix, all eigenvalues are positive.

To solve a symmetric eigenvalue problem with LAPACK, you usually need to reduce the matrixto tridiagonal form and then solve the eigenvalue problem with the tridiagonal matrix obtained.LAPACK includes routines for reducing the matrix to a tridiagonal form by an orthogonal (orunitary) similarity transformation A = QTQH as well as for solving tridiagonal symmetriceigenvalue problems. These routines (for Fortran-77 interface) are listed in Table 4-3 . Respectiveroutine names in Fortran-95 interface are without the first symbol (see Routine NamingConventions).

There are different routines for symmetric eigenvalue problems, depending on whether youneed all eigenvectors or only some of them or eigenvalues only, whether the matrix A ispositive-definite or not, and so on.

These routines are based on three primary algorithms for computing eigenvalues andeigenvectors of symmetric problems: the divide and conquer algorithm, the QR algorithm, andbisection followed by inverse iteration. The divide and conquer algorithm is generally moreefficient and is recommended for computing all eigenvalues and eigenvectors. Furthermore, tosolve an eigenvalue problem using the divide and conquer algorithm, you need to call only oneroutine. In general, more than one routine has to be called if the QR algorithm or bisectionfollowed by inverse iteration is used.

Decision tree in Figure 4-2 will help you choose the right routine or sequence of routines foreigenvalue problems with real symmetric matrices. A similar decision tree for complex Hermitianmatrices is presented in Figure 4-3 .

676


Figure 4-2 Decision Tree: Real Symmetric Eigenvalue Problems

677


Figure 4-3 Decision Tree: Complex Hermitian Eigenvalue Problems

Table 4-3 Computational Routines for Solving Symmetric Eigenvalue Problems

Complex Hermitianmatrices

Real symmetric matricesOperation

?hetrd ?herdb?sytrd ?syrdbReduce to tridiagonal form A = QTQH

(full storage)

?hptrd?sptrdReduce to tridiagonal form A = QTQH

(packed storage)

678


Complex Hermitianmatrices

Real symmetric matricesOperation

?hbtrd?sbtrdReduce to tridiagonal form A = QTQH

(band storage).

?ungtr?orgtrGenerate matrix Q (full storage)

?upgtr?opgtrGenerate matrix Q (packed storage)

?unmtr?ormtrApply matrix Q (full storage)

?upmtr?opmtrApply matrix Q (packed storage)

?sterfFind all eigenvalues of a tridiagonalmatrix T

?steqr ?stedc?steqr ?stedcFind all eigenvalues and eigenvectorsof a tridiagonal matrix T

?pteqr?pteqrFind all eigenvalues and eigenvectorsof a tridiagonal positive-definite matrixT.

?stegr?stebz ?stegrFind selected eigenvalues of atridiagonal matrix T

?stein ?stegr?stein ?stegrFind selected eigenvectors of atridiagonal matrix T

?stemr?stemrFind selected eigenvalues andeigenvectors of f a real symmetrictridiagonal matrix T

?disna?disnaCompute the reciprocal conditionnumbers for the eigenvectors

679


?sytrdReduces a real symmetric matrix to tridiagonalform.

Syntax

Fortran 77:

call ssytrd(uplo, n, a, lda, d, e, tau, work, lwork, info)

call dsytrd(uplo, n, a, lda, d, e, tau, work, lwork, info)

Fortran 95:

call sytrd(a, tau [,uplo] [,info])

Description

This routine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonalsimilarity transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but isrepresented as a product of n-1 elementary reflectors. Routines are provided for working withQ in this representation. (They are described later in this section .)

This routine calls ?latrd to reduce a real symmetric matrix to tridiagonal form by an orthogonalsimilarity transformation.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', a stores the upper triangular part of A.If uplo = 'L', a stores the lower triangular part of A.

INTEGER. The order of the matrix A (n ≥ 0).n

REAL for ssytrda, workDOUBLE PRECISION for dsytrd.a(lda,*) is an array containing either upper or lowertriangular part of the matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



680



Output Parameters

Overwritten by the tridiagonal matrix T and details of theorthogonal matrix Q, as specified by uplo.

a

REAL for ssytrdd, e, tauDOUBLE PRECISION for dsytrd.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).tau(*) stores further details of the orthogonal matrix Q.The dimension of tau must be at least max(1, n-1).

If info=0, on exit work(1) contains the minimum value oflwork required for optimum performance. Use this lworkfor subsequent runs.

work(1)




Specific details for the routine sytrd interface are the following:


Holds the vector of length (n-1).tau




681


Application Notes






The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)ε ||A||2,


The approximate number of floating-point operations is (4/3)n3.

After calling this routine, you can call the following:

to form the computed matrix Q explicitly?orgtr

to multiply a real matrix by Q.?ormtr

The complex counterpart of this routine is ?hetrd.

682


?syrdbReduces a real symmetric matrix to tridiagonalform with Successive Bandwidth Reductionapproach.

Syntax

Fortran 77:

call ssyrdb(jobz, uplo, n, kd, a, lda, d, e, tau, z, ldz, work, lwork, info)

call dsyrdb(jobz, uplo, n, kd, a, lda, d, e, tau, z, ldz, work, lwork, info)

Description

This routine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonalsimilarity transformation: A = Q*T*QT and optionally multuplies matrix Z by Q, or simply formsQ.

This routine reduces a full symmetric matrix to the banded symmetric form, and then to thetridiagonal symmetric form with a Successive Bandwidth Reduction approach after Prof.C.Bischof's works (see for instance, [Bischof92]). ?syrdb is functionally close to ?sytrd routinebut the tridiagonal form may differ from those obtained by ?sytrd. Unlike ?sytrd, theorthogonal matrix Q cannot be restored from the details of matrix A on exit.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz = 'N', then only A is reduced to T.If jobz = 'V', then A is reduced to T and Z contains Q onexit.If jobz = 'U', then A is reduced to T and Z contains ZQ onexit.



INTEGER. The bandwidth of the banded matrix B (kd ≥ 1).kd

REAL for ssyrdb.a,z, work

683


DOUBLE PRECISION for dsyrdb.a(lda,*) is an array containing either upper or lowertriangular part of the matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).z(ldz,*), the second dimension of z must be at least max(1,n).If jobz = 'U', then the matrix z is multiplied by Q.If jobz = 'V', then z content on input is discarded.If jobz = 'N', then z is not referenced.work(lwork) is a workspace array.


INTEGER. The first dimension of z; at least max(1, n). Notreferenced if jobz = 'N'

ldz

INTEGER. The size of the work array (lwork ≥(2kd+1)n+kd).

lwork


Output Parameters

Overwritten by the banded matrix B and details of theorthogonal matrix QB to reduce A to B as specified by uplo.

a

On exit,zif jobz = 'U', then the matrix z is overwritten by zQ.If jobz = 'V', then z contains Q.If jobz = 'N', then z is not referenced.

DOUBLE PRECISION.d, e, tauArrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).tau(*) stores further details of the orthogonal matrix Q.The dimension of tau must be at least max(1, n-kd-1).

684



work(1)


Application Notes

For better performance, try using lwork = n*(3*kd+3).





For better performance, try using kd equal to 40 if n ≤ 2000 and 64 otherwise.

Try using ?syrdb instead of ?sytrd on large matrices obtaining only eigenvalues - when noeigenvectors are needed, especially in multi-threaded environment. ?syrdb becomes fasterbeginning approximately with n = 1000, and much faster at larger matrices with a betterscalability than ?sytrd.

Avoid applying ?syrdb for computing eigenvectors due to the two-step reduction, that is, thenumber of operations needed to apply orthogonal transformations to Z is doubled comparedto the traditional one-step reduction. In that case it is better to apply ?sytrd and?ormtr/?orgtr to obtain tridiagonal form along with the orthogonal transformation matrix Q.

685


?herdbReduces a complex Hermitian matrix to tridiagonalform with Successive Bandwidth Reductionapproach.

Syntax

Fortran 77:

call cherdb(jobz, uplo, n, kd, a, lda, d, e, tau, z, ldz, work, lwork, info)

call zherdb(jobz, uplo, n, kd, a, lda, d, e, tau, z, ldz, work, lwork, info)

Description

This routine reduces a complex Hermitian matrix A to symmetric tridiagonal form T by a unitarysimilarity transformation: A = Q*T*QT and optionally multuplies matrix Z by Q, or simply formsQ.

This routine reduces a full Hermitian matrix to the banded Hermitian form, and then to thetridiagonal symmetric form with a Successive Bandwidth Reduction approach after Prof.C.Bischof's works (see for instance, [Bischof92]). ?herdb is functionally close to ?hetrd routinebut the tridiagonal form may differ from those obtained by ?hetrd. Unlike ?hetrd, theorthogonal matrix Q cannot be restored from the details of matrix A on exit.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz = 'N', then only A is reduced to T.If jobz = 'V', then A is reduced to T and Z contains Q onexit.If jobz = 'U', then A is reduced to T and Z contains ZQ onexit.



INTEGER. The bandwidth of the banded matrix B (kd ≥ 1).kd

COMPLEX for cherdb.a,z, work

686


DOUBLE COMPLEX for zherdb.a(lda,*) is an array containing either upper or lowertriangular part of the matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).z(ldz,*), the second dimension of z must be at least max(1,n).If jobz = 'U', then the matrix z is multiplied by Q.If jobz = 'V', then z content on input is discarded.If jobz = 'N', then z is not referenced.work(lwork) is a workspace array.


INTEGER. The first dimension of z; at least max(1, n). Notreferenced if jobz = 'N'

ldz

INTEGER. The size of the work array (lwork ≥(2kd+1)n+kd).

lwork


Output Parameters

Overwritten by the banded matrix B and details of theunitary matrix QB to reduce A to B as specified by uplo.

a

On exit,zif jobz = 'U', then the matrix z is overwritten by zQ.If jobz = 'V', then z contains Q.If jobz = 'N', then z is not referenced.

COMPLEX for cherdb.d, eDOUBLE COMPLEX for zherdb.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).tau(*) stores further details of the orthogonal matrix Q.

687


The dimension of tau must be at least max(1, n-kd-1).

COMPLEX for cherdb.tauDOUBLE COMPLEX for zherdb.Array, DIMENSION at least max(1, n-1)Stores further details of the unitary matrix QB. The dimensionof tau must be at least max(1, n-kd-1).


work(1)


Application Notes

For better performance, try using lwork = n*(3*kd+3).


If you choose the first option and set any of admissible lwork size, which is no less than theminimal value described, the routine completes the task, though probably not so fast as witha recommended workspace, and provides the recommended workspace in the first element ofthe corresponding array (work) on exit. Use this value (work(1)) for subsequent runs.



For better performance, try using kd equal to 40 if n ≤ 2000 and 64 otherwise.

Try using ?herdb instead of ?hetrd on large matrices obtaining only eigenvalues - when noeigenvectors are needed, especially in multi-threaded environment. ?herdb becomes fasterbeginning approximately with n = 1000, and much faster at larger matrices with a betterscalability than ?hetrd.

688


Avoid applying ?herdb for computing eigenvectors due to the two-step reduction, that is, thenumber of operations needed to apply orthogonal transformations to Z is doubled comparedto the traditional one-step reduction. In that case it is better to apply ?hetrd and?unmtr/?ungtr to obtain tridiagonal form along with the unitary transformation matrix Q.

?orgtrGenerates the real orthogonal matrix Q determinedby ?sytrd.

Syntax

Fortran 77:

call sorgtr(uplo, n, a, lda, tau, work, lwork, info)

call dorgtr(uplo, n, a, lda, tau, work, lwork, info)

Fortran 95:

call orgtr(a, tau [,uplo] [,info])

Description

The routine explicitly generates the n-by-n orthogonal matrix Q formed by ?sytrd when reducinga real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to?sytrd.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?sytrd.

INTEGER. The order of the matrix Q (n ≥ 0).n

REAL for sorgtra, tau, workDOUBLE PRECISION for dorgtr.Arrays:a(lda,*) is the array a as returned by ?sytrd.The second dimension of a must be at least max(1, n).tau(*) is the array tau as returned by ?sytrd.The dimension of tau must be at least max(1, n-1).work is a workspace array, its dimension max(1,lwork).

689





Output Parameters

Overwritten by the orthogonal matrix Q.a


work(1)




Specific details for the routine orgtr interface are the following:




Application Notes

For better performance, try using lwork = (n-1)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.


690






= O(ε), where e is the machine precision.


The complex counterpart of this routine is ?ungtr.

?ormtrMultiplies a real matrix by the real orthogonalmatrix Q determined by ?sytrd.

Syntax

Fortran 77:

call sormtr(side, uplo, trans, m, n, a, lda, tau, c, ldc, work, lwork, info)

call dormtr(side, uplo, trans, m, n, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call ormtr(a, tau, c [,side] [,uplo] [,trans] [,info])

Description

The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by?sytrd when reducing a real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use thisroutine after a call to ?sytrd.

Depending on the parameters side and trans, the routine can form one of the matrix productsQ*C, QT*C, C*Q, or C*QT(overwriting the result on C).

691


Input Parameters

In the descriptions below, r denotes the order of Q:



CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?sytrd.




REAL for sormtra, c, tau, workDOUBLE PRECISION for dormtra(lda,*) and tau are the arrays returned by ?sytrd.The second dimension of a must be at least max(1, r).The dimension of tau must be at least max(1, r-1).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a; lda ≥ max(1, r).lda

INTEGER. The first dimension of c; ldc ≥ max(1, n).ldc




692


Output Parameters


c


work(1)




Specific details for the routine ormtr interface are the following:

Holds the matrix A of size (r,r).ar = m if side = 'L'.r = n if side = 'R'.

Holds the vector of length (r-1).tau





Application Notes

For better performance, try using lwork = n*blocksize for side = 'L', or lwork =m*blocksize for side = 'R', where blocksize is a machine-dependent value (typically, 16to 64) required for optimum performance of the blocked algorithm.



693





The total number of floating-point operations is approximately 2*m2*n, if side = 'L', or2*n2*m, if side = 'R'.

The complex counterpart of this routine is ?unmtr.

?hetrdReduces a complex Hermitian matrix to tridiagonalform.

Syntax

Fortran 77:

call chetrd(uplo, n, a, lda, d, e, tau, work, lwork, info)

call zhetrd(uplo, n, a, lda, d, e, tau, work, lwork, info)

Fortran 95:

call hetrd(a, tau [,uplo] [,info])

Description

This routine reduces a complex Hermitian matrix A to symmetric tridiagonal form T by a unitarysimilarity transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but isrepresented as a product of n-1 elementary reflectors. Routines are provided to work with Q inthis representation. (They are described later in this section .)

This routine calls ?latrd to reduce a complex Hermitian matrix A to Hermitian tridiagonal formby a unitary similarity transformation.

694


Input Parameters



COMPLEX for chetrda, workDOUBLE COMPLEX for zhetrd.a(lda,*) is an array containing either upper or lowertriangular part of the matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).




Output Parameters

Overwritten by the tridiagonal matrix T and details of theunitary matrix Q, as specified by uplo.

a

REAL for chetrdd, eDOUBLE PRECISION for zhetrd.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).

COMPLEX for chetrd DOUBLE COMPLEX for zhetrd.tauArray, DIMENSION at least max(1, n-1). Stores furtherdetails of the unitary matrix Q.

695



work(1)




Specific details for the routine hetrd interface are the following:






Application Notes






696






to form the computed matrix Q explicitly?ungtr

to multiply a complex matrix by Q.?unmtr

The real counterpart of this routine is ?sytrd.

?ungtrGenerates the complex unitary matrix Qdetermined by ?hetrd.

Syntax

Fortran 77:

call cungtr(uplo, n, a, lda, tau, work, lwork, info)

call zungtr(uplo, n, a, lda, tau, work, lwork, info)

Fortran 95:

call ungtr(a, tau [,uplo] [,info])

The routine explicitly generates the n-by-n unitary matrix Q formed by ?hetrd when reducinga complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to?hetrd.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?hetrd.


COMPLEX for cungtra, tau, workDOUBLE COMPLEX for zungtr.Arrays:a(lda,*) is the array a as returned by ?hetrd.The second dimension of a must be at least max(1, n).

697


tau(*) is the array tau as returned by ?hetrd.The dimension of tau must be at least max(1, n-1).work is a workspace array, its dimension max(1, lwork).




Output Parameters

Overwritten by the unitary matrix Q.a


work(1)




Specific details for the routine ungtr interface are the following:




Application Notes

For better performance, try using lwork = (n-1)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.

698






The computed matrix Q differs from an exactly unitary matrix by a matrix E such that ||E||2

= O(ε), where e is the machine precision.


The real counterpart of this routine is ?orgtr.

?unmtrMultiplies a complex matrix by the complex unitarymatrix Q determined by ?hetrd.

Syntax

Fortran 77:

call cunmtr(side, uplo, trans, m, n, a, lda, tau, c, ldc, work, lwork, info)

call zunmtr(side, uplo, trans, m, n, a, lda, tau, c, ldc, work, lwork, info)

Fortran 95:

call unmtr(a, tau, c [,side] [,uplo] [,trans] [,info])

Description

The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix Q formedby ?hetrd when reducing a complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Usethis routine after a call to ?hetrd.

699



Input Parameters




CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?hetrd.

CHARACTER*1. Must be either 'N' or 'T'.transIf trans = 'N', the routine multiplies C by Q.If trans = 'T', the routine multiplies C by QH.



COMPLEX for cunmtra, c, tau, workDOUBLE COMPLEX for zunmtr.a(lda,*) and tau are the arrays returned by ?hetrd.The second dimension of a must be at least max(1, r).The dimension of tau must be at least max(1, r-1).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a; lda ≥ max(1, r).lda




lwork ≥ max(1, m) if side = 'R'.

700



Output Parameters


c


work(1)




Specific details for the routine unmtr interface are the following:







Application Notes

For better performance, try using lwork = n*blocksize (for side = 'L') or lwork =m*blocksize (for side = 'R') where blocksize is a machine-dependent value (typically,16 to 64) required for optimum performance of the blocked algorithm.

701






The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)

||C||2, where ε is the machine precision.

The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*mif side = 'R'.

The real counterpart of this routine is ?ormtr.

?sptrdReduces a real symmetric matrix to tridiagonalform using packed storage.

Syntax

Fortran 77:

call ssptrd(uplo, n, ap, d, e, tau, info)

call dsptrd(uplo, n, ap, d, e, tau, info)

Fortran 95:

call sptrd(a, tau [,uplo] [,info])

702


Description

This routine reduces a packed real symmetric matrix A to symmetric tridiagonal form T by anorthogonal similarity transformation: A = Q*T*QT. The orthogonal matrix Q is not formedexplicitly but is represented as a product of n-1 elementary reflectors. Routines are providedfor working with Q in this representation. (They are described later in this section .)

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangle of A.If uplo = 'L', ap stores the packed lower triangle of A.


REAL for ssptrdapDOUBLE PRECISION for dsptrd.Array, DIMENSION at least max(1, n(n+1)/2). Contains eitherupper or lower triangle of A (as specified by uplo) in packedform.

Output Parameters


ap

REAL for ssptrdd, e, tauDOUBLE PRECISION for dsptrd.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).tau(*) stores further details of the matrix Q.The dimension of tau must be at least max(1, n-1).


703




Specific details for the routine sptrd interface are the following:


a





Application Notes

The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)ε ||A||2,c(n) is a modestly increasing function of n, and e is the machine precision. The approximatenumber of floating-point operations is (4/3)n3.


to form the computed matrix Q explicitly?opgtr

to multiply a real matrix by Q.?opmtr

The complex counterpart of this routine is ?hptrd.

?opgtrGenerates the real orthogonal matrix Q determinedby ?sptrd.

Syntax

Fortran 77:

call sopgtr(uplo, n, ap, tau, q, ldq, work, info)

call dopgtr(uplo, n, ap, tau, q, ldq, work, info)

704


Fortran 95:

call opgtr(a, tau, q [,uplo] [,info])

Description

The routine explicitly generates the n-by-n orthogonal matrix Q formed by ?sptrd when reducinga packed real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a callto ?sptrd.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'. Use the same uplo assupplied to ?sptrd.

uplo


REAL for sopgtrap, tauDOUBLE PRECISION for dopgtr.Arrays ap and tau, as returned by ?sptrd.The dimension of ap must be at least max(1, n(n+1)/2).The dimension of tau must be at least max(1, n-1).

INTEGER. The first dimension of the output array q; at leastmax(1, n).

ldq

REAL for sopgtrworkDOUBLE PRECISION for dopgtr.Workspace array, DIMENSION at least max(1, n-1).

Output Parameters

REAL for sopgtrqDOUBLE PRECISION for dopgtr.Array, DIMENSION (ldq,*).Contains the computed matrix Q.The second dimension of q must be at least max(1, n).


705




Specific details for the routine opgtr interface are the following:


a


Holds the matrix Q of size (n,n).q


Application Notes


= O(ε), where ε is the machine precision.


The complex counterpart of this routine is ?upgtr.

?opmtrMultiplies a real matrix by the real orthogonalmatrix Q determined by ?sptrd.

Syntax

Fortran 77:

call sopmtr(side, uplo, trans, m, n, ap, tau, c, ldc, work, info)

call dopmtr(side, uplo, trans, m, n, ap, tau, c, ldc, work, info)

Fortran 95:

call opmtr(a, tau, c [,side] [,uplo] [,trans] [,info])

706


Description

The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by?sptrd when reducing a packed real symmetric matrix A to tridiagonal form: A = QTQT. Usethis routine after a call to ?sptrd.

Depending on the parameters side and trans, the routine can form one of the matrix productsQ

*C, QT*C, C*Q, or C*QT (overwriting the result on C).

Input Parameters




CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?sptrd.




REAL for sopmtrap, tau, c, workDOUBLE PRECISION for dopmtr.ap and tau are the arrays returned by ?sptrd.The dimension of ap must be at least max(1, r(r+1)/2).The dimension of tau must be at least max(1, r-1).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work(*) is a workspace array.The dimension of work must be at leastmax(1, n) if side = 'L';max(1, m) if side = 'R'.


707


Output Parameters


c




Specific details for the routine opmtr interface are the following:

Stands for argument ap in Fortan 77 interface. Holds the array A of size(r*(r+1)/2), where

a

r = m if side = 'L'.r = n if side = 'R'.






Application Notes

The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)

||C||2, where ε is the machine precision.


The complex counterpart of this routine is ?upmtr.

708


?hptrdReduces a complex Hermitian matrix to tridiagonalform using packed storage.

Syntax

Fortran 77:

call chptrd(uplo, n, ap, d, e, tau, info)

call zhptrd(uplo, n, ap, d, e, tau, info)

Fortran 95:

call hptrd(a, tau [,uplo] [,info])

Description

This routine reduces a packed complex Hermitian matrix A to symmetric tridiagonal form T bya unitary similarity transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitlybut is represented as a product of n-1 elementary reflectors. Routines are provided for workingwith Q in this representation. They are described later in this section .

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangle of A.If uplo = 'L', ap stores the packed lower triangle of A.


COMPLEX for chptrdapDOUBLE COMPLEX for zhptrd.Array, DIMENSION at least max(1, n(n+1)/2). Contains eitherupper or lower triangle of A (as specified by uplo) in packedform.

Output Parameters


ap

REAL for chptrdd, eDOUBLE PRECISION for zhptrd.

709


Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).

COMPLEX for chptrdtauDOUBLE COMPLEX for zhptrd.Arrays, DIMENSION at least max(1, n-1). Contains furtherdetails of the orthogonal matrix Q.




Specific details for the routine hptrd interface are the following:


a





Application Notes





to form the computed matrix Q explicitly?upgtr

to multiply a complex matrix by Q.?upmtr

The real counterpart of this routine is ?sptrd.

710


?upgtrGenerates the complex unitary matrix Qdetermined by ?hptrd.

Syntax

Fortran 77:

call cupgtr(uplo, n, ap, tau, q, ldq, work, info)

call zupgtr(uplo, n, ap, tau, q, ldq, work, info)

Fortran 95:

call upgtr(a, tau, q [,uplo] [,info])

Description

The routine explicitly generates the n-by-n unitary matrix Q formed by ?hptrd when reducinga packed complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after acall to ?hptrd.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'. Use the same uplo assupplied to ?sptrd.

uplo


COMPLEX for cupgtrap, tauDOUBLE COMPLEX for zupgtr.Arrays ap and tau, as returned by ?hptrd.The dimension of ap must be at least max(1, n(n+1)/2).The dimension of tau must be at least max(1, n-1).

INTEGER. The first dimension of the output array q;ldqat least max(1, n).

COMPLEX for cupgtrworkDOUBLE COMPLEX for zupgtr.Workspace array, DIMENSION at least max(1, n-1).

711


Output Parameters

COMPLEX for cupgtrqDOUBLE COMPLEX for zupgtr.Array, DIMENSION (ldq,*).Contains the computed matrix Q.The second dimension of q must be at least max(1, n).




Specific details for the routine upgtr interface are the following:


a




Application Notes


= O(ε), where ε is the machine precision.


The real counterpart of this routine is ?opgtr.

712


?upmtrMultiplies a complex matrix by the unitary matrixQ determined by ?hptrd.

Syntax

Fortran 77:

call cupmtr(side, uplo, trans, m, n, ap, tau, c, ldc, work, info)

call zupmtr(side, uplo, trans, m, n, ap, tau, c, ldc, work, info)

Fortran 95:

call upmtr(a, tau, c [,side] [,uplo] [,trans] [,info])

Description

The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix Q formedby ?hptrd when reducing a packed complex Hermitian matrix A to tridiagonal form: A = Q*T*QH.Use this routine after a call to ?hptrd.


Input Parameters




CHARACTER*1. Must be 'U' or 'L'.uploUse the same uplo as supplied to ?hptrd.

CHARACTER*1. Must be either 'N' or 'T'.transIf trans = 'N', the routine multiplies C by Q.If trans = 'T', the routine multiplies C by QH.



713


COMPLEX for cupmtrap, tau, c,DOUBLE COMPLEX for zupmtr.ap and tau are the arrays returned by ?hptrd.The dimension of ap must be at least max(1, r(r+1)/2).The dimension of tau must be at least max(1, r-1).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n)work(*) is a workspace array.The dimension of work must be at leastmax(1, n) if side = 'L';max(1, m) if side = 'R'.


Output Parameters


c




Specific details for the routine upmtr interface are the following:

Stands for argument ap in Fortan 77 interface. Holds the array A of size(r*(r+1)/2), where

a

r = m if side = 'L'.r = n if side = 'R'.




Must be 'U' or 'L'.The default value is 'U'.uplo


714


Application Notes

The computed product differs from the exact product by a matrix E such that ||E||2 = Oε||C||2, where e is the machine precision.


The real counterpart of this routine is ?opmtr.

?sbtrdReduces a real symmetric band matrix totridiagonal form.

Syntax

Fortran 77:

call ssbtrd(vect, uplo, n, kd, ab, ldab, d, e, q, ldq, work, info)

call dsbtrd(vect, uplo, n, kd, ab, ldab, d, e, q, ldq, work, info)

Fortran 95:

call sbtrd(a [, q] [,vect] [,uplo] [,info])

Description

This routine reduces a real symmetric band matrix A to symmetric tridiagonal form T by anorthogonal similarity transformation: A = Q*T*QT. The orthogonal matrix Q is determined asa product of Givens rotations.

If required, the routine can also form the matrix Q explicitly.

Input Parameters

CHARACTER*1. Must be 'V' or 'N'.vectIf vect = 'V', the routine returns the explicit matrix Q.If vect = 'N', the routine does not return Q.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ab stores the upper triangular part of A.If uplo = 'L', ab stores the lower triangular part of A.

715


INTEGER. The number of super- or sub-diagonals in Akd(kd < 0).

REAL for ssbtrdab, workDOUBLE PRECISION for dsbtrd.ab (ldab,*) is an array containing either upper or lowertriangular part of the matrix A (as specified by uplo) in bandstorage format.The second dimension of ab must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least max(1, n).

INTEGER. The first dimension of ab; at least kd+1.ldab

INTEGER. The first dimension of q. Constraints:ldq

ldq ≥ max(1, n) if vect = 'V';

ldq ≥ 1 if vect = 'N'.

Output Parameters

On exit, the array ab is overwritten.ab

REAL for ssbtrdd, e, qDOUBLE PRECISION for dsbtrd.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).q(ldq,*) is not referenced if vect = 'N'.If vect = 'V', q contains the n-by-n matrix Q.The second dimension of q must be:at least max(1, n) if vect = 'V';at least 1 if vect = 'N'.


716



Specific details for the routine sbtrd interface are the following:


a





If omitted, this argument is restored based on the presence of argumentq as follows: vect = 'V', if q is present, vect = 'N', if q is omitted.

vect

If present, vect must be equal to 'V' or 'U' and the argument q mustalso be present. Note that there will be an error condition if vect ispresent and q omitted.

Application Notes


c(n) is a modestly increasing function of n, and ε is the machine precision. The computed matrix

Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).

The total number of floating-point operations is approximately 6n2*kd if vect = 'N', with3n3*(kd-1)/kd additional operations if vect = 'V'.

The complex counterpart of this routine is ?hbtrd.

717


?hbtrdReduces a complex Hermitian band matrix totridiagonal form.

Syntax

Fortran 77:

call chbtrd(vect, uplo, n, kd, ab, ldab, d, e, q, ldq, work, info)

call zhbtrd(vect, uplo, n, kd, ab, ldab, d, e, q, ldq, work, info)

Fortran 95:

call hbtrd(a [, q] [,vect] [,uplo] [,info])

Description

This routine reduces a complex Hermitian band matrix A to symmetric tridiagonal form T by aunitary similarity transformation: A = Q*T*QH. The unitary matrix Q is determined as a productof Givens rotations.

If required, the routine can also form the matrix Q explicitly.

Input Parameters

CHARACTER*1. Must be 'V' or 'N'.vectIf vect = 'V', the routine returns the explicit matrix Q.If vect = 'N', the routine does not return Q.



INTEGER. The number of super- or sub-diagonals in Akd

(kd ≥ 0).

COMPLEX for chbtrdab, workDOUBLE COMPLEX for zhbtrd.ab (ldab,*) is an array containing either upper or lowertriangular part of the matrix A (as specified by uplo) in bandstorage format.

718


The second dimension of ab must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least max(1, n).

INTEGER. The first dimension of ab; at least kd+1.ldab

INTEGER. The first dimension of q. Constraints:ldq

ldq ≥ max(1, n) if vect = 'V';

ldq ≥ 1 if vect = 'N'.

Output Parameters

On exit, the array ab is overwritten.ab

REAL for chbtrdd, eDOUBLE PRECISION for zhbtrd.Arrays:d(*) contains the diagonal elements of the matrix T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).

COMPLEX for chbtrdqDOUBLE COMPLEX for zhbtrd.Array, DIMENSION (ldq,*).If vect = 'N', q is not referenced.If vect = 'V', q contains the n-by-n matrix Q.The second dimension of q must be:at least max(1, n) if vect = 'V';at least 1 if vect = 'N'.




Specific details for the routine hbtrd interface are the following:

719



a





If omitted, this argument is restored based on the presence of argumentq as follows: vect = 'V', if q is present, vect = 'N', if q is omitted.

vect

If present, vect must be equal to 'V' or 'U' and the argument q mustalso be present. Note that there will be an error condition if vect ispresent and q omitted.

Application Notes


c(n) is a modestly increasing function of n, and ε is the machine precision. The computed matrix

Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε).

The total number of floating-point operations is approximately 20n2*kd if vect = 'N', with10n3*(kd-1)/kd additional operations if vect = 'V'.

The real counterpart of this routine is ?sbtrd.

?sterfComputes all eigenvalues of a real symmetrictridiagonal matrix using QR algorithm.

Syntax

Fortran 77:

call ssterf(n, d, e, info)

call dsterf(n, d, e, info)

Fortran 95:

call sterf(d, e [,info])

720


Description

This routine computes all the eigenvalues of a real symmetric tridiagonal matrix T (which canbe obtained by reducing a symmetric or Hermitian matrix to tridiagonal form). The routine usesa square-root-free variant of the QR algorithm.

If you need not only the eigenvalues but also the eigenvectors, call ?steqr.

Input Parameters

INTEGER. The order of the matrix T (n ≥ 0).n

REAL for ssterfd, eDOUBLE PRECISION for dsterf.Arrays:d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).

Output Parameters

The n eigenvalues in ascending order, unless info > 0.dSee also info.

On exit, the array is overwritten; see info.e

INTEGER.infoIf info = 0, the execution is successful.If info = i, the algorithm failed to find all the eigenvaluesafter 30n iterations:i off-diagonal elements have not converged to zero. Onexit, d and e contain, respectively, the diagonal andoff-diagonal elements of a tridiagonal matrix orthogonallysimilar to T.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine sterf interface are the following:

721




Application Notes

The computed eigenvalues and eigenvectors are exact for a matrix T + E such that ||E||2 =

O(ε) ||T||2, where ε is the machine precision.

If λi is an exact eigenvalue, and mi is the corresponding computed value, then

|μi - λi| ≤ c(n)ε ||T||2

where c(n) is a modestly increasing function of n.

The total number of floating-point operations depends on how rapidly the algorithm converges.Typically, it is about 14n2.

?steqrComputes all eigenvalues and eigenvectors of asymmetric or Hermitian matrix reduced totridiagonal form (QR algorithm).

Syntax

Fortran 77:

call ssteqr(compz, n, d, e, z, ldz, work, info)

call dsteqr(compz, n, d, e, z, ldz, work, info)

call csteqr(compz, n, d, e, z, ldz, work, info)

call zsteqr(compz, n, d, e, z, ldz, work, info)

Fortran 95:

call rsteqr(d, e [,z] [,compz] [,info])

call steqr(d, e [,z] [,compz] [,info])

722


Description

This routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetrictridiagonal matrix T. In other words, the routine can compute the spectral factorization: T =

ZΛZT. Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi; Z is anorthogonal matrix whose columns are eigenvectors. Thus,

Tzi = λizi for i = 1, 2, ..., n.

The routine normalizes the eigenvectors so that ||zi||2 = 1.

You can also use the routine for computing the eigenvalues and eigenvectors of an arbitraryreal symmetric (or complex Hermitian) matrix A reduced to tridiagonal form T: A = Q*T*QH.

In this case, the spectral factorization is as follows: A = Q*T*QH = (Q*Z)Λ(Q*Z)H. Beforecalling ?steqr, you must reduce A to tridiagonal form and generate the explicit matrix Q bycalling the following routines:

for complex matrices:for real matrices:

?hetrd, ?ungtr?sytrd, ?orgtrfull storage

?hptrd, ?upgtr?sptrd, ?opgtrpacked storage

?hbtrd (vect='V')?sbtrd (vect='V')band storage

If you need eigenvalues only, it's more efficient to call ?sterf. If T is positive-definite, ?pteqrcan compute small eigenvalues more accurately than ?steqr.

To solve the problem by a single call, use one of the divide and conquer routines ?stevd,?syevd, ?spevd, or ?sbevd for real symmetric matrices or ?heevd, ?hpevd, or ?hbevd forcomplex Hermitian matrices.

Input Parameters

CHARACTER*1. Must be 'N' or 'I' or 'V'.compzIf compz = 'N', the routine computes eigenvalues only.If compz = 'I', the routine computes the eigenvalues andeigenvectors of the tridiagonal matrix T.If compz = 'V', the routine computes the eigenvalues andeigenvectors of A (and the array z must contain the matrixQ on entry).

723



REAL for single-precision flavorsd, e, workDOUBLE PRECISION for double-precision flavors.Arrays:d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).work(*) is a workspace array.The dimension of work must be:at least 1 if compz = 'N';at least max(1, 2*n-2) if compz = 'V' or 'I'.

REAL for ssteqrzDOUBLE PRECISION for dsteqrCOMPLEX for csteqrDOUBLE COMPLEX for zsteqr.Array, DIMENSION (ldz, *)If compz = 'N' or 'I', z need not be set.If vect = 'V', z must contain the n-by-n matrix Q.The second dimension of z must be:at least 1 if compz = 'N';at least max(1, n) if compz = 'V' or 'I'.work (lwork) is a workspace array.

INTEGER. The first dimension of z. Constraints:ldz

ldz ≥ 1 if compz = 'N';

ldz ≥ max(1, n) if compz = 'V' or 'I'.

Output Parameters

The n eigenvalues in ascending order, unless info > 0.dSee also info.


If info = 0, contains the n orthonormal eigenvectors,stored by columns. (The i-th column corresponds to theith eigenvalue.)

z

INTEGER.info

724


If info = 0, the execution is successful.If info = i, the algorithm failed to find all the eigenvaluesafter 30n iterations: i off-diagonal elements have notconverged to zero. On exit, d and e contain, respectively,the diagonal and off-diagonal elements of a tridiagonalmatrix orthogonally similar to T.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine steqr interface are the following:



Holds the matrix Z of size (n,n).z

If omitted, this argument is restored based on the presence of argumentz as follows:

compz

compz = 'I', if z is present,compz = 'N', if z is omitted.If present, compz must be equal to 'I' or 'V' and the argument zmust also be present. Note that there will be an error condition if compzis present and z omitted.Note that two variants of Fortran 95 interface for steqr routine areneeded because of an ambiguous choice between real and complexcases appear when z is omitted. Thus, the name rsteqr is used in realcases (single or double precision), and the name steqr is used incomplex cases (single or double precision).

Application Notes



If λi is an exact eigenvalue, and μi is the corresponding computed value, then

|μi - λi| ≤ c(n)ε ||T||2

725



If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then

the angle θ(zi, wi) between them is bounded as follows:

θ(zi, wi) ≤ c(n)ε ||T||2 / mini≠j|λi - λj|.

The total number of floating-point operations depends on how rapidly the algorithm converges.Typically, it is about

24n2 if compz = 'N';

7n3 (for complex flavors, 14n3) if compz = 'V' or 'I'.

?stemrComputes selected eigenvalues and eigenvectorsof a real symmetric tridiagonal matrix.

Syntax

Fortran 77:

call sstemr(jobz, range, n, d, e, vl, vu, il, iu, m, w, z, ldz, nzc, isuppz,tryrac, work, lwork, iwork, liwork, info)

call dstemr(jobz, range, n, d, e, vl, vu, il, iu, m, w, z, ldz, nzc, isuppz,tryrac, work, lwork, iwork, liwork, info)

call cstemr(jobz, range, n, d, e, vl, vu, il, iu, m, w, z, ldz, nzc, isuppz,tryrac, work, lwork, iwork, liwork, info)

call zstemr(jobz, range, n, d, e, vl, vu, il, iu, m, w, z, ldz, nzc, isuppz,tryrac, work, lwork, iwork, liwork, info)

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix T. Any such unreduced matrix has a well defined set of pairwise differentreal eigenvalues, the corresponding real eigenvectors are pairwise orthogonal.

The spectrum may be computed either completely or partially by specifying either an interval(vl,vu] or a range of indices il:iu for the desired eigenvalues.

726


Depending on the number of desired eigenvalues, these are computed either by bisection orthe dqds algorithm. Numerically orthogonal eigenvectors are computed by the use of varioussuitable L*D*LT factorizations near clusters of close eigenvalues (referred to as RRRs, RelativelyRobust Representations). An informal sketch of the algorithm follows.

For each unreduced block (submatrix) of T,

a. Compute T - sigma*I = L*D*LT, so that L and D define all the wanted eigenvalues tohigh relative accuracy. This means that small relative changes in the entries of L and D causeonly small relative changes in the eigenvalues and eigenvectors. The standard (unfactored)representation of the tridiagonal matrix T does not have this property in general.

b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithmattains full accuracy of the computed eigenvalues only right before the corresponding vectorshave to be computed, see steps c and d.

c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a newfactorization, and refine the shifted eigenvalues to suitable accuracy.

d. For each eigenvalue with a large enough relative separation compute the correspondingeigenvector by forming a rank revealing twisted factorization. Go back to step c for anyclusters that remain.

For more details, see: [Dhillon04], [Dhillon04-02], [Dhillon97]

The routine works only on machines which follow IEEE-754 floating-point standard in theirhandling of infinities and NaNs (NaN stands for "not a number"). This permits the use of efficientinner loops avoiding a check for zero divisors.

LAPACK routines can be used to reduce a complex Hermitean matrix to real symmetric tridiagonalform.

(Any complex Hermitean tridiagonal matrix has real values on its diagonal and potentiallycomplex numbers on its off-diagonals. By applying a similarity transform with an appropriatediagonal matrix diag(1,e{i \phy_1}, ... , e{i \phy_{n-1}}), the complex Hermitean matrixcan be transformed into a real symmetric matrix and complex arithmetic can be entirely avoided.)While the eigenvectors of the real symmetric tridiagonal matrix are real, the eigenvectors oforiginal complex Hermitean matrix have complex entries in general. Since LAPACK driversoverwrite the matrix data with the eigenvectors, zstemr accepts complex workspace to facilitateinteroperability with zunmtr or zupmtr.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz = 'N', then only eigenvalues are computed.

727


If jobz = 'V', then eigenvalues and eigenvectors arecomputed.

CHARACTER*1. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes all eigenvalues inthe half-open interval: (vl, vu].If range = 'I', the routine computes eigenvalues withindices il to iu.

INTEGER. The order of the matrix T (n≥0).n

REAL for single precision flavorsdDOUBLE PRECISION for double precision flavors.Array, DIMENSION (n).Contains n diagonal elements of the tridiagonal matrix T.

REAL for single precision flavorseDOUBLE PRECISION for double precision flavors.Array, DIMENSION (n-1).Contains (n-1) off-diagonal elements of the tridiagonalmatrix T in elements 1 to n-1 of e. e(n) need not be seton input, but is used internally as workspace.

REAL for single precision flavorsvl, vuDOUBLE PRECISION for double precision flavors.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues. Constraint: vl<vu.If range = 'A' or 'I', vl and vu are not referenced.

INTEGER.il, iuIf range = 'I', the indices in ascending order of thesmallest and largest eigenvalues to be returned.

Constraint: 1≤il≤iu≤n, if n>0.If range = 'A' or 'V', il and iu are not referenced.

INTEGER. The leading dimension of the output array z.ldz

if jobz = 'V', then ldz ≥ max(1, n);

ldz ≥ 1 otherwise.

INTEGER. The number of eigenvectors to be held in thearray z.

nzc

If range = 'A', then nzc≥max(1, n);

728

If range = 'V', then nzc is greater than or equal to thenumber of eigenvalues in the half-open interval: (vl, vu].

If range = 'I', then nzc≥il+iu+1.This value is returned as the first entry of the array z, andno error message related to nzc is issued by the routinexerbla.

LOGICAL.tryracIf tryrac = .TRUE., it indicates that the code should checkwhether the tridiagonal matrix defines its eigenvalues tohigh relative accuracy. If so, the code uses relative-accuracypreserving algorithms that might be (a bit) slower dependingon the matrix. If the matrix does not define its eigenvaluesto high relative accuracy, the code can uses possibly fasteralgorithms.If tryrac = .FALSE., the code is not required to guaranteerelatively accurate eigenvalues and can use the fastestpossible techniques.

REAL for single precision flavorsworkDOUBLE PRECISION for double precision flavors.Workspace array, DIMENSION (lwork).

INTEGER.lworkThe dimension of the array work,

lwork≥max(1, 18*n).If lwork=-1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

INTEGER.iworkWorkspace array, DIMENSION (liwork).

INTEGER.liworkThe dimension of the array iwork.

lwork≥max(1, 10*n) if the eigenvectors are desired, and

lwork≥max(1, 8*n) if only the eigenvalues are to becomputed.

729


If liwork=-1, then a workspace query is assumed; theroutine only calculates the optimal size of the iwork array,returns this value as the first entry of the iwork array, andno error message related to liwork is issued by xerbla.

Output Parameters

On exit, the array e is overwritten.e

INTEGER.m

The total number of eigenvalues found, 0≤m≤n.If range = 'A', then m=n, and if If range = 'I', thenm=iu-il+1.

REAL for single precision flavorswDOUBLE PRECISION for double precision flavors.Array, DIMENSION (n).The first m elements contain the selected eigenvaluesin ascending order.REAL for sstemrzDOUBLE PRECISION for dstemrCOMPLEX for cstemrDOUBLE COMPLEX for zstemr.Array z(ldz, *), the second dimension of z must be at leastmax(1, m).If jobz = 'V', and info = 0, then the first m columns ofz contain the orthonormal eigenvectors of the matrix Tcorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z ; if range = 'V', the exact valueof m is not known in advance and an can be computed witha workspace query by setting nzc=-1, see description ofthe parameter nzc.

INTEGER.isuppzArray, DIMENSION (2*max(1, m)).

730


The support of the eigenvectors in z, that is the indicesindicating the nonzero elements in z. The i-th computedeigenvector is nonzero only in elements isuppz(2*i-1)through isuppz(2*i). This is relevant in the case when thematrix is split. isuppz is only accessed when jobz = 'V'and n>0.

On exit, TRUE. tryrac is set to .FALSE. if the matrix doesnot define its eigenvalues to high relative accuracy.

tryrac

On exit, if info = 0, then work(1) returns the optimal(and minimal) size of lwork.

work(1)

On exit, if info = 0, then iwork(1) returns the optimalsize of liwork.

iwork(1)

INTEGER.infoIf = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, internal error in ?larre occurred,if info = 2, internal error in ?larrv occurred.

?stedcComputes all eigenvalues and eigenvectors of asymmetric tridiagonal matrix using the divide andconquer method.

Syntax

Fortran 77:

call sstedc(compz, n, d, e, z, ldz, work, lwork, iwork, liwork, info)

call dstedc(compz, n, d, e, z, ldz, work, lwork, iwork, liwork, info)

call cstedc(compz, n, d, e, z, ldz, work, lwork, rwork, lrwork, iwork, liwork,info)

call zstedc(compz, n, d, e, z, ldz, work, lwork, rwork, lrwork, iwork, liwork,info)

731


Fortran 95:

call rstedc(d, e [,z] [,compz] [,info])

call stedc(d, e [,z] [,compz] [,info])

Description

This routine computes all the eigenvalues and (optionally) all the eigenvectors of a symmetrictridiagonal matrix using the divide and conquer method. The eigenvectors of a full or band realsymmetric or complex Hermitian matrix can also be found if ?sytrd/?hetrd or ?sptrd/?hptrdor ?sbtrd/?hbtrd has been used to reduce this matrix to tridiagonal form.

See also ?laed0, ?laed1, ?laed2, ?laed3, ?laed4, ?laed5, ?laed6, ?laed7, ?laed8,?laed9, and ?laeda used by this function.

Input Parameters

CHARACTER*1. Must be 'N' or 'I' or 'V'.compzIf compz = 'N', the routine computes eigenvalues only.If compz = 'I', the routine computes the eigenvalues andeigenvectors of the tridiagonal matrix.If compz = 'V', the routine computes the eigenvalues andeigenvectors of original symmetric/Hermitian matrix. Onentry, the array z must contain the orthogonal/unitarymatrix used to reduce the original matrix to tridiagonal form.

INTEGER. The order of the symmetric tridiagonal matrix (n

≥ 0).

n

REAL for single-precision flavorsd, e, rworkDOUBLE PRECISION for double-precision flavors.Arrays:d(*) contains the diagonal elements of the tridiagonalmatrix.The dimension of d must be at least max(1, n).e(*) contains the subdiagonal elements of the tridiagonalmatrix.The dimension of e must be at least max(1, n-1).rwork is a workspace array, its dimension max(1,lrwork).

REAL for sstedcz, workDOUBLE PRECISION for dstedc

732


COMPLEX for cstedcDOUBLE COMPLEX for zstedc.Arrays: z(ldz, *), work(*).If compz = 'V', then, on entry, z must contain theorthogonal/unitary matrix used to reduce the original matrixto tridiagonal form.The second dimension of z must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).




INTEGER. The dimension of the array work.lwork

If compz = 'N'or 'I', or n ≤ 1, lwork must be at least1.If compz = 'V' and n > 1, lwork must be at least n*n.Note that for compz = 'V', and if n is less than or equalto the minimum divide size, usually 25, then lwork needonly be 1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for the required value of lwork.

INTEGER. The dimension of the array rwork (used forcomplex flavors only).

lrwork

If compz = 'N', or n ≤ 1, lrwork must be at least 1.If compz = 'V' and n > 1, lrwork must be at least(1+3*n+2*n*lg(n)+3*n*n), where lg(n)is the smallest

integer k such that 2**k≥n.If compz = 'I' and n > 1, lrwork must be at least(1+4*n+2*n*n).Note that for compz = 'V'or 'I', and if n is less than orequal to the minimum divide size, usually 25, then lrworkneed only be max(1, 2*(n-1)).

733


If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for the required value of lrwork.

INTEGER. Workspace array, its dimension max(1, liwork).iwork

INTEGER. The dimension of the array iwork.liwork

If compz = 'N', or n ≤ 1, liwork must be at least 1.If compz = 'V' and n > 1, liwork must be at least(6+6*n+5*n*lg(n), where lg(n)is the smallest integer k

such that 2**k≥n.If compz = 'I' and n > 1, liwork must be at least(3+5*n).Note that for compz = 'V'or 'I', and if n is less than orequal to the minimum divide size, usually 25, then liworkneed only be 1.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for the required value of liwork.

Output Parameters

The n eigenvalues in ascending order, unless info ≠ 0.d

See also info.


If info = 0, then if compz = 'V', z contains theorthonormal eigenvectors of the originalsymmetric/Hermitian matrix, and if compz = 'I', z containsthe orthonormal eigenvectors of the symmetric tridiagonalmatrix. If compz = 'N', z is not referenced.

z

On exit, if info = 0, then work(1) returns the optimallwork.

work(1)

734


On exit, if info = 0, then rwork(1) returns the optimallrwork (for complex flavors only).

rwork(1)

On exit, if info = 0, then iwork(1) returns the optimalliwork.

iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value. Ifinfo = i, the algorithm failed to compute an eigenvaluewhile working on the submatrix lying in rows and columnsi/(n+1) through mod(i, n+1).



Specific details for the routine stedc interface are the following:




If omitted, this argument is restored based on the presence of argumentz as follows: compz = 'I', if z is present, compz = 'N', if z is omitted.

compz

If present, compz must be equal to 'I' or 'V' and the argument zmust also be present. Note that there will be an error condition if compzis present and z omitted.

Note that two variants of Fortran 95 interface for stedc routine are needed because of anambiguous choice between real and complex cases appear when z and work are omitted. Thus,the name rstedc is used in real cases (single or double precision), and the name stedc isused in complex cases (single or double precision).

Application Notes

The required size of workspace arrays must be as follows.

For sstedc/dstedc:

If compz = 'N' or n ≤ 1 then lwork must be at least 1.

735


If compz = 'V' and n > 1 then lwork must be at least (1 + 3n + 2n·lgn + 3n2), where

lg(n) = smallest integer k such that 2k≥ n.

If compz = 'I' and n > 1 then lwork must be at least (1 + 4n + n2).

If compz = 'N' or n ≤ 1 then liwork must be at least 1.

If compz = 'V' and n > 1 then liwork must be at least (6 + 6n + 5n·lgn).

If compz = 'I' and n > 1 then liwork must be at least (3 + 5n).

For cstedc/zstedc:

If compz = 'N' or'I', or n ≤ 1, lwork must be at least 1.

If compz = 'V' and n > 1, lwork must be at least n2.

If compz = 'N' or n ≤ 1, lrwork must be at least 1.

If compz = 'V' and n > 1, lrwork must be at least (1 + 3n + 2n·lgn + 3n2), where lg(n ) =

smallest integer k such that 2k≥ n.

If compz = 'I' and n > 1, lrwork must be at least(1 + 4n + 2n2).

The required value of liwork for complex flavors is the same as for real flavors.

You may set lwork (or liwork or lrwork, if supplied) to -1. The routine returns immediatelyand provides the recommended workspace in the first element of the corresponding array(work, iwork, rwork). This operation is called a workspace query.

Note that if you set lwork (liwork, lrwork) to less than the minimal required value and not-1, the routine returns immediately with an error exit and does not provide any information onthe recommended workspace.

736


?stegrComputes selected eigenvalues and eigenvectorsof a real symmetric tridiagonal matrix.

Syntax

Fortran 77:

call sstegr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

call dstegr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

call cstegr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

call zstegr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

Fortran 95:

call rstegr(d, e, w [,z] [,vl] [,vu] [,il] [,iu] [,m] [,isuppz] [,abstol][,info])

call stegr(d, e, w [,z] [,vl] [,vu] [,il] [,iu] [,m] [,isuppz] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix T. Any such unreduced matrix has a well defined set of pairwise differentreal eigenvalues, the corresponding real eigenvectors are pairwise orthogonal.

The spectrum may be computed either completely or partially by specifying either an interval(vl,vu] or a range of indices il:iu for the desired eigenvalues.

?sregr is a compatability wrapper around the improved ?stemr routine. See it descriprion forfurther details.

Note that the abstol parameter no longer provides any benefit and hence is no longer used.

See also auxiliary ?lasq2 ?lasq5, ?lasq6 , used by this routine.

737


Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf job = 'N', then only eigenvalues are computed.If job = 'V', then eigenvalues and eigenvectors arecomputed.

CHARACTER*1. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalueslambda(i) in the half-open interval:

vl< lambda(i) ≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.


REAL for single precision flavorsd, e, workDOUBLE PRECISION for double precision flavors.Arrays:d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the subdiagonal elements of T in elements 1to n-1; e(n) need not be set on input, but it is used as aworkspace.The dimension of e must be at least max(1, n).work(lwork) is a workspace array.

REAL for single precision flavorsvl, vuDOUBLE PRECISION for double precision flavors.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu ≤ n, if n > 0.If range = 'A' or 'V', il and iu are not referenced.

REAL for single precision flavorsabstol

738

DOUBLE PRECISION for double precision flavors.Unused. Was the absolute error tolerance for theeigenvalues/eigenvectors in previous versions.INTEGER. The leading dimension of the output array z.Constraints:

ldz

ldz < 1 if jobz = 'N';ldz < max(1, n) jobz = 'V', an.

INTEGER.lworkThe dimension of the array work,

lwork≥max(1, 18*n) if jobz = 'V', and

lwork≥max(1, 12*n) if jobz = 'N'.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes below for details.

INTEGER.iworkWorkspace array, DIMENSION (liwork).

INTEGER.liwork

The dimension of the array iwork, lwork ≥ max(1, 10*n)

if the eigenvectors are desired, and lwork ≥ max(1, 8*n)if only the eigenvalues are to be computed..If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the iwork array,returns this value as the first entry of the iwork array, andno error message related to liwork is issued by xerbla.See Application Notes below for details.

Output Parameters

On exit, d and e are overwritten.d, e

INTEGER. The total number of eigenvalues found,m

0 ≤ m ≤ n.If range = 'A', m = n, and if range = 'I', m = iu-il+1.

REAL for single precision flavorswDOUBLE PRECISION for double precision flavors.

739

Array, DIMENSION at least max(1, n).The selected eigenvalues in ascending order, stored in w(1)to w(m).

REAL for sstegrzDOUBLE PRECISION for dstegrCOMPLEX for cstegrDOUBLE COMPLEX for zstegr.Array z(ldz, *), the second dimension of z must be at leastmax(1, m).If jobz = 'V', and if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Tcorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z ; if range = 'V', the exact valueof m is not known in advance and an upper bound must beused. Supplying n columns is always safe.

INTEGER.isuppzArray, DIMENSION at least (2*max(1, m)).The support of the eigenvectors in z, that is the indicesindicating the nonzero elements in z. The i-th computedeigenvector is nonzero only in elements isuppz(2*i-1)through isuppz(2*i). This is relevant in the case whenthe matrix is split. isuppz is only accessed when jobz ='V', and n > 0.

On exit, if info = 0, then work(1) returns the requiredminimal size of lwork.

work(1)

On exit, if info = 0, then iwork(1) returns the requiredminimal size of liwork.

iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1x, internal error in ?larre occurred,If info = 2x, internal error in ?larrv occurred. Here thedigit x = abs(iinfo) < 10, where iinfo is the non-zeroerror code returned by ?larre or ?larrv, respectively.

740



Specific details for the routine stegr interface are the following:



Holds the vector of length (n).w

Holds the matrix Z of size (n,m).z

Holds the vector of length (2*m).isuppz

Default value for this argument is vl = - HUGE (vl) where HUGE(a)means the largest machine number of the same precision as argumenta.

vl

Default value for this argument is vu = HUGE (vl).vu

Default value for this argument is il = 1.il

Default value for this argument is iu = n.iu

Default value for this argument is abstol = 0.0_WP.abstol

Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omitted.

Restored based on the presence of arguments vl, vu, il, iu as follows:rangerange = 'V', if one of or both vl and vu are present,range = 'I', if one of or both il and iu are present,range = 'A', if none of vl, vu, il, iu is present,Note that there will be an error condition if one of or both vl and vuare present and at the same time one of or both il and iu are present.

Note that two variants of Fortran 95 interface for stegr routine are needed because of anambiguous choice between real and complex cases appear when z is omitted. Thus, the namerstegr is used in real cases (single or double precision), and the name stegr is used in complexcases (single or double precision).

741


Application Notes

Currently ?stegr is only set up to find all the n eigenvalues and eigenvectors of T in O(n2)time, that is, only range = 'A' is supported.

Currently the routine ?stein is called when an appropriate si cannot be chosen in step (c)above. ?stein invokes modified Gram-Schmidt when eigenvalues are close.

?stegr works only on machines which follow IEEE-754 floating-point standard in their handlingof infinities and NaNs. Normal execution of ?stegr may create NaNs and infinities and hencemay abort due to a floating point exception in environments which do not conform to theIEEE-754 standard.

If you are in doubt how much workspace to supply, use a generous value of lwork (or liwork)for the first run or set lwork = -1 (liwork = -1).

If you choose the first option and set any of admissible lwork (or liwork) sizes, which is noless than the minimal value described, the routine completes the task, though probably not sofast as with a recommended workspace, and provides the recommended workspace in the firstelement of the corresponding array (work, iwork) on exit. Use this value (work(1), iwork(1))for subsequent runs.

If you set lwork = -1 (liwork = -1), the routine returns immediately and provides therecommended workspace in the first element of the corresponding array (work, iwork). Thisoperation is called a workspace query.

Note that if you set lwork (liwork) to less than the minimal required value and not -1, theroutine returns immediately with an error exit and does not provide any information on therecommended workspace.

742


?pteqrComputes all eigenvalues and (optionally) alleigenvectors of a real symmetric positive-definitetridiagonal matrix.

Syntax

Fortran 77:

call spteqr(compz, n, d, e, z, ldz, work, info)

call dpteqr(compz, n, d, e, z, ldz, work, info)

call cpteqr(compz, n, d, e, z, ldz, work, info)

call zpteqr(compz, n, d, e, z, ldz, work, info)

Fortran 95:

call rpteqr(d, e [,z] [,compz] [,info])

call pteqr(d, e [,z] [,compz] [,info])

Description

This routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetricpositive-definite tridiagonal matrix T. In other words, the routine can compute the spectral

factorization: T = ZΛZT.

Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonalmatrix whose columns are eigenvectors. Thus,

Tzi = λizi for i = 1, 2, ..., n.

(The routine normalizes the eigenvectors so that ||zi||2 = 1.)

You can also use the routine for computing the eigenvalues and eigenvectors of real symmetric(or complex Hermitian) positive-definite matrices A reduced to tridiagonal form T: A = Q*T*QH.

In this case, the spectral factorization is as follows: A = Q*T*QH = (QZ)Λ(QZ)H. Before calling?pteqr, you must reduce A to tridiagonal form and generate the explicit matrix Q by callingthe following routines:

743


for complex matrices:for real matrices:

?hetrd, ?ungtr?sytrd, ?orgtrfull storage

?hptrd, ?upgtr?sptrd, ?opgtrpacked storage

?hbtrd (vect='V')?sbtrd (vect='V')band storage

The routine first factorizes T as L*D*LH where L is a unit lower bidiagonal matrix, and D is adiagonal matrix. Then it forms the bidiagonal matrix B = L*D1/2 and calls ?bdsqr to computethe singular values of B, which are the same as the eigenvalues of T.

Input Parameters

CHARACTER*1. Must be 'N' or 'I' or 'V'.compzIf compz = 'N', the routine computes eigenvalues only.If compz = 'I', the routine computes the eigenvalues andeigenvectors of the tridiagonal matrix T.If compz = 'V', the routine computes the eigenvalues andeigenvectors of A (and the array z must contain the matrixQ on entry).


REAL for single-precision flavorsd, e, workDOUBLE PRECISION for double-precision flavors.Arrays:d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).work(*) is a workspace array.The dimension of work must be:at least 1 if compz = 'N';at least max(1, 4*n-4) if compz = 'V' or 'I'.

REAL for spteqrzDOUBLE PRECISION for dpteqrCOMPLEX for cpteqrDOUBLE COMPLEX for zpteqr.Array, DIMENSION (ldz,*)

744


If compz = 'N' or 'I', z need not be set.If vect = 'V', z must contains the n-by-n matrix Q.The second dimension of z must be:at least 1 if compz = 'N';at least max(1, n) if compz = 'V' or 'I'.




Output Parameters

The n eigenvalues in descending order, unless info > 0.dSee also info.

On exit, the array is overwritten.e

If info = 0, contains the n orthonormal eigenvectors,stored by columns. (The ith column corresponds to the itheigenvalue.)

z

INTEGER.infoIf info = 0, the execution is successful.If info = i, the leading minor of order i (and hence Titself) is not positive-definite.If info = n + i, the algorithm for computing singularvalues failed to converge; i off-diagonal elements have notconverged to zero.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine pteqr interface are the following:




745



compz

compz = 'I', if z is present,compz = 'N', if z is omitted.If present, compz must be equal to 'I' or 'V' and the argument zmust also be present. Note that there will be an error condition if compzis present and z omitted.

Note that two variants of Fortran 95 interface for pteqr routine are needed because of anambiguous choice between real and complex cases appear when z is omitted. Thus, the namerpteqr is used in real cases (single or double precision), and the name pteqr is used in complexcases (single or double precision).

Application Notes


|μi - λi| ≤ c(n)εKλi

where c(n) is a modestly increasing function of n, ε is the machine precision, and K = ||DTD||2||(DTD)-1||2, D is diagonal with dii = tii

-1/2.


the angle θ(zi, wi) between them is bounded as follows: θ(ui, wi) ≤ c(n)εK / mini≠j(|λi- λj|/|λi + λj|).

Here mini≠j(|λi - λj|/|λi + λj|) is the relative gap between λi and the other eigenvalues.

The total number of floating-point operations depends on how rapidly the algorithm converges.

Typically, it is about

30n2 if compz = 'N';

6n3 (for complex flavors, 12n3) if compz = 'V' or 'I'.

746


?stebzComputes selected eigenvalues of a real symmetrictridiagonal matrix by bisection.

Syntax

Fortran 77:

call sstebz (range, order, n, vl, vu, il, iu, abstol, d, e, m, nsplit, w,iblock, isplit, work, iwork, info)

call dstebz (range, order, n, vl, vu, il, iu, abstol, d, e, m, nsplit, w,iblock, isplit, work, iwork, info)

Fortran 95:

call stebz(d, e, m, nsplit, w, iblock, isplit [, order] [,vl] [,vu] [,il][,iu] [,abstol] [,info])

Description

This routine computes some (or all) of the eigenvalues of a real symmetric tridiagonal matrixT by bisection. The routine searches for zero or negligible off-diagonal elements to see if T splitsinto block-diagonal form T = diag(T1, T2, ...). Then it performs bisection on each of theblocks Ti and returns the block index of each computed eigenvalue, so that a subsequent callto ?stein can also take advantage of the block structure.

See also ?laebz.

Input Parameters

CHARACTER*1. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalues

lambda(i) in the half-open interval: vl < lambda(i) ≤vu.If range = 'I', the routine computes eigenvalues withindices il to iu.

CHARACTER*1. Must be 'B' or 'E'.orderIf order = 'B', the eigenvalues are to be ordered fromsmallest to largest within each split-off block.

747

If order = 'E', the eigenvalues for the entire matrix areto be ordered from smallest to largest.


REAL for sstebzvl, vuDOUBLE PRECISION for dstebz.If range = 'V', the routine computes eigenvalueslambda(i) in the half-open interval:

vl < lambda(i) ≤ vu.If range = 'A' or 'I', vl and vu are not referenced.

INTEGER. Constraint: 1 ≤ il ≤ iu ≤ n.il, iu

If range = 'I', the routine computes eigenvalues

lambda(i) such that il≤ i ≤ iu (assuming that theeigenvalues lambda(i) are in ascending order).If range = 'A' or 'V', il and iu are not referenced.

REAL for sstebzabstolDOUBLE PRECISION for dstebz.The absolute tolerance to which each eigenvalue is required.An eigenvalue (or cluster) is considered to have convergedif it lies in an interval of width abstol.

If abstol ≤ 0.0, then the tolerance is taken as eps*|T|,where eps is the machine precision, and |T| is the 1-normof the matrix T.

REAL for sstebzd, e, workDOUBLE PRECISION for dstebz.Arrays:d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of T.The dimension of e must be at least max(1, n-1).work(*) is a workspace array.The dimension of work must be at least max(1, 4n).

INTEGER. Workspace.iworkArray, DIMENSION at least max(1, 3n).

748

Output Parameters

INTEGER. The actual number of eigenvalues found.m

INTEGER. The number of diagonal blocks detected in T.nsplit

REAL for sstebzwDOUBLE PRECISION for dstebz.Array, DIMENSION at least max(1, n). The computedeigenvalues, stored in w(1) to w(m).

INTEGER.iblock, isplitArrays, DIMENSION at least max(1, n).A positive value iblock(i) is the block number of theeigenvalue stored in w(i) (see also info).The leading nsplit elements of isplit contain points atwhich T splits into blocks Ti as follows: the block T1 containsrows/columns 1 to isplit(1); the block T2 containsrows/columns isplit(1)+1 to isplit(2), and so on.

INTEGER.infoIf info = 0, the execution is successful.If info = 1, for range = 'A' or 'V', the algorithm failedto compute some of the required eigenvalues to the desiredaccuracy; iblock(i)<0 indicates that the eigenvalue storedin w(i) failed to converge.If info = 2, for range = 'I', the algorithm failed tocompute some of the required eigenvalues. Try calling theroutine again with range = 'A'.If info = 3:for range = 'A' or 'V', same as info = 1;for range = 'I', same as info = 2.If info = 4, no eigenvalues have been computed. Thefloating-point arithmetic on the computer is not behavingas expected.If info = -i, the i-th parameter had an illegal value.



749

Specific details for the routine stebz interface are the following:




Holds the vector of length (n).iblock

Holds the vector of length (n).isplit

Must be 'B' or 'E'. The default value is 'B'.order

Default value for this argument is vl = - HUGE (vl) where HUGE(a)means the largest machine number of the same precision as argumenta.

vl

Default value for this argument is vu = HUGE (vl).vu



Default value for this argument is abstol = 0.0_WP.abstol

Restored based on the presence of arguments vl, vu, il, iu as follows:rangerange = 'V', if one of or both vl and vu are present,range = 'I', if one of or both il and iu are present,range = 'A', if none of vl, vu, il,iu is present, Note that there will be an error condition if one of or bothvl and vu are present and at the same time one of or both il and iuare present.

Application Notes

The eigenvalues of T are computed to high relative accuracy which means that if they varywidely in magnitude, then any small eigenvalues will be computed more accurately than, forexample, with the standard QR method. However, the reduction to tridiagonal form (prior tocalling the routine) may exclude the possibility of obtaining high relative accuracy in the smalleigenvalues of the original matrix if its eigenvalues vary widely in magnitude.

750


?steinComputes the eigenvectors corresponding tospecified eigenvalues of a real symmetrictridiagonal matrix.

Syntax

Fortran 77:

call sstein(n, d, e, m, w, iblock, isplit, z, ldz, work, iwork, ifailv, info)

call dstein(n, d, e, m, w, iblock, isplit, z, ldz, work, iwork, ifailv, info)

call cstein(n, d, e, m, w, iblock, isplit, z, ldz, work, iwork, ifailv, info)

call zstein(n, d, e, m, w, iblock, isplit, z, ldz, work, iwork, ifailv, info)

Fortran 95:

call stein(d, e, w, iblock, isplit, z [,ifailv] [,info])

Description

This routine computes the eigenvectors of a real symmetric tridiagonal matrix T correspondingto specified eigenvalues, by inverse iteration. It is designed to be used in particular after thespecified eigenvalues have been computed by ?stebz with order = 'B', but may also beused when the eigenvalues have been computed by other routines.

If you use this routine after ?stebz, it can take advantage of the block structure by performinginverse iteration on each block Ti separately, which is more efficient than using the wholematrix T.

If T has been formed by reduction of a full symmetric or Hermitian matrix A to tridiagonal form,you can transform eigenvectors of T to eigenvectors of A by calling ?ormtr or ?opmtr (for realflavors) or by calling ?unmtr or ?upmtr (for complex flavors).

Input Parameters


INTEGER. The number of eigenvectors to be returned.m

REAL for single-precision flavorsd, e, wDOUBLE PRECISION for double-precision flavors.Arrays:

751


d(*) contains the diagonal elements of T.The dimension of d must be at least max(1, n).e(*) contains the sub-diagonal elements of T stored inelements 1 to n-1The dimension of e must be at least max(1, n-1).w(*) contains the eigenvalues of T, stored in w(1) to w(m)(as returned by ?stebz). Eigenvalues of T1 must be suppliedfirst, in non-decreasing order; then those of T2, again innon-decreasing order, and so on. Constraint:

if iblock(i) = iblock(i+1), w(i) ≤ w(i+1).The dimension of w must be at least max(1, n).

INTEGER.iblock, isplitArrays, DIMENSION at least max(1, n). The arrays iblockand isplit, as returned by ?stebz with order = 'B'.If you did not call ?stebz with order = 'B', set allelements of iblock to 1, and isplit(1) to n.)

INTEGER. The first dimension of the output array z; ldz ≥max(1, n).

ldz

REAL for single-precision flavorsworkDOUBLE PRECISION for double-precision flavors.Workspace array, DIMENSION at least max(1, 5n).


Output Parameters

REAL for ssteinzDOUBLE PRECISION for dsteinCOMPLEX for csteinDOUBLE COMPLEX for zstein.Array, DIMENSION (ldz, *).If info = 0, z contains the m orthonormal eigenvectors,stored by columns. (The ith column corresponds to the i-thspecified eigenvalue.)

INTEGER.ifailvArray, DIMENSION at least max(1, m).

752


If info = i > 0, the first i elements of ifailv contain theindices of any eigenvectors that failed to converge.

INTEGER.infoIf info = 0, the execution is successful.If info = i, then i eigenvectors (as indicated by theparameter ifailv) each failed to converge in 5 iterations.The current iterates are stored in the corresponding columnsof the array z.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine stein interface are the following:




Holds the vector of length (n).iblock

Holds the vector of length (n).isplit

Holds the matrix Z of size (n,m).z

Holds the vector of length (m).ifailv

Application Notes

Each computed eigenvector zi is an exact eigenvector of a matrix T + Ei, where ||Ei||2 =

O(ε) ||T||2. However, a set of eigenvectors computed by this routine may not be orthogonalto so high a degree of accuracy as those computed by ?steqr.

753


?disnaComputes the reciprocal condition numbers for theeigenvectors of a symmetric/ Hermitian matrix orfor the left or right singular vectors of a generalmatrix.

Syntax

Fortran 77:

call sdisna(job, m, n, d, sep, info)

call ddisna(job, m, n, d, sep, info)

Fortran 95:

call disna(d, sep [,job] [,minmn] [,info])

Description

This routine computes the reciprocal condition numbers for the eigenvectors of a real symmetricor complex Hermitian matrix or for the left or right singular vectors of a general m-by-n matrix.

The reciprocal condition number is the 'gap' between the corresponding eigenvalue or singularvalue and the nearest other one.

The bound on the error, measured by angle in radians, in the i-th computed vector is givenby

slamch('E')*(anorm/sep(i))

where anorm = ||A||2 = max( |d(j)| ). sep(i) is not allowed to be smaller thanslamch('E')*anorm in order to limit the size of the error bound.

?disna may also be used to compute error bounds for eigenvectors of the generalized symmetricdefinite eigenproblem.

Input Parameters

CHARACTER*1. Must be 'E','L', or 'R'. Specifies for whichproblem the reciprocal condition numbers should becomputed:

job

job = 'E': for the eigenvectors of a symmetric/Hermitianmatrix;job = 'L': for the left singular vectors of a general matrix;

754


job = 'R': for the right singular vectors of a general matrix.

INTEGER. The number of rows of the matrix (m ≥ 0).m

INTEGER.nIf job = 'L', or 'R', the number of columns of the matrix

(n ≥ 0). Ignored if job = 'E'.

REAL for sdisnadDOUBLE PRECISION for ddisna.Array, dimension at least max(1,m) if job = 'E', and atleast max(1, min(m,n)) if job = 'L' or 'R'.This array must contain the eigenvalues (if job = 'E') orsingular values (if job = 'L' or 'R') of the matrix, in eitherincreasing or decreasing order.If singular values, they must be non-negative.

Output Parameters

REAL for sdisnasepDOUBLE PRECISION for ddisna.Array, dimension at least max(1,m) if job = 'E', and atleast max(1, min(m,n)) if job = 'L' or 'R'. The reciprocalcondition numbers of the vectors.




Specific details for the routine disna interface are the following:


Holds the vector of length min(m,n).sep

Must be 'E', 'L', or 'R'. The default value is 'E'.job

Indicates which of the values m or n is smaller. Must be either 'M' or'N', the default is 'M'.

minmn

755


If job = 'E', this argument is superfluous, If job = 'L' or 'R', thisargument is used by the routine.

Generalized Symmetric-Definite Eigenvalue Problems

Generalized symmetric-definite eigenvalue problems are as follows: find the

eigenvalues λ and the corresponding eigenvectors z that satisfy one of these equations:

Az = λBz, ABz = λz, or = λz,

where A is an n-by-n symmetric or Hermitian matrix, and B is an n-by-n symmetricpositive-definite or Hermitian positive-definite matrix.

In these problems, there exist n real eigenvectors corresponding to real eigenvalues (even forcomplex Hermitian matrices A and B).

Routines described in this section allow you to reduce the above generalized problems to

standard symmetric eigenvalue problem Cy = λy, which you can solve by calling LAPACKroutines described earlier in this chapter (see Symmetric Eigenvalue Problems).

Different routines allow the matrices to be stored either conventionally or in packed storage.Prior to reduction, the positive-definite matrix B must first be factorized using either ?potrfor ?pptrf.

The reduction routine for the banded matrices A and B uses a split Cholesky factorization forwhich a specific routine ?pbstf is provided. This refinement halves the amount of work requiredto form matrix C.

Table 4-4 lists LAPACK routines (Fortran-77 interface) that can be used to solve generalizedsymmetric-definite eigenvalue problems. Respective routine names in Fortran-95 interface arewithout the first symbol (see Routine Naming Conventions).

Table 4-4 Computational Routines for Reducing Generalized Eigenproblems to StandardProblems

Factorizebandmatrix

Reduce to standardproblems (bandmatrices)

Reduce to standardproblems (packedstorage)

Reduce to standardproblems (fullstorage)

Matrixtype

?pbstf?sbgst?spgst?sygstrealsymmetricmatrices

756


Factorizebandmatrix

Reduce to standardproblems (bandmatrices)

Reduce to standardproblems (packedstorage)

Reduce to standardproblems (fullstorage)

Matrixtype

?pbstf?hbgst?hpgst?hegstcomplexHermitianmatrices

?sygstReduces a real symmetric-definite generalizedeigenvalue problem to the standard form.

Syntax

Fortran 77:

call ssygst(itype, uplo, n, a, lda, b, ldb, info)

call dsygst(itype, uplo, n, a, lda, b, ldb, info)

Fortran 95:

call sygst(a, b [,itype] [,uplo] [,info])

Description

This routine reduces real symmetric-definite generalized eigenproblems

Az = λBz, ABz = λz, or BAz = λz

to the standard form Cy = λy. Here A is a real symmetric matrix, and B is a real symmetricpositive-definite matrix. Before calling this routine, call ?potrf to compute the Choleskyfactorization: B = UTU or B = LLT.

Input Parameters

INTEGER. Must be 1 or 2 or 3.itypeIf itype = 1, the generalized eigenproblem is A*z =lambda*B*zfor uplo = 'U': C = inv(UT)*A*inv(U), z = inv(U)*y;for uplo = 'L': C = inv(L)*A*inv(LT), z = inv(LT)*y.

757


If itype = 2, the generalized eigenproblem is A*B*z =lambda*zfor uplo = 'U': C = U*A*UT, z = inv(U)*y;for uplo = 'L': C = LT*A*L, z = inv(LT)*y.If itype = 3, the generalized eigenproblem is B*A*z =lambda*zfor uplo = 'U': C = U*A*UT, z = UT*y;for uplo = 'L': C = LT*A*L, z = L*y.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', the array a stores the upper triangle of A;you must supply B in the factored form B = UT*U.If uplo = 'L', the array a stores the lower triangle of A;you must supply B in the factored form B = L*LT.

INTEGER. The order of the matrices A and B (n ≥ 0).n

REAL for ssygsta, bDOUBLE PRECISION for dsygst.Arrays:a(lda,*) contains the upper or lower triangle of A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the Cholesky-factored matrix B:B = UT*U or B = L*LT (as returned by ?potrf).The second dimension of b must be at least max(1, n).



Output Parameters

The upper or lower triangle of A is overwritten by the upperor lower triangle of C, as specified by the arguments itypeand uplo.

a


758




Specific details for the routine sygst interface are the following:


Holds the matrix B of size (n,n).b

Must be 1, 2, or 3. The default value is 1.itype


Application Notes

Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplicationby B-1 (if itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in thecomputation of eigenvalues and eigenvectors of the original problem, there may be a significantloss of accuracy if B is ill-conditioned with respect to inversion.

The approximate number of floating-point operations is n3.

?hegstReduces a complex Hermitian-definite generalizedeigenvalue problem to the standard form.

Syntax

Fortran 77:

call chegst(itype, uplo, n, a, lda, b, ldb, info)

call zhegst(itype, uplo, n, a, lda, b, ldb, info)

Fortran 95:

call hegst(a, b [,itype] [,uplo] [,info])

Description

This routine reduces complex Hermitian-definite generalized eigenvalue problems


759


to the standard form Cy = λy. Here the matrix A is complex Hermitian, and B is complexHermitian positive-definite. Before calling this routine, you must call ?potrf to compute theCholesky factorization: B = UH*U or B = L*LH.

Input Parameters

INTEGER. Must be 1 or 2 or 3.itypeIf itype = 1, the generalized eigenproblem is A*z =lambda*B*zfor uplo = 'U': C = inv(UH)*A*inv(U), z = inv(U)*y;for uplo = 'L': C = inv(L)*A*inv(LH), z = inv(LH)*y.If itype = 2, the generalized eigenproblem is A*B*z =lambda*zfor uplo = 'U': C = U*A*UH, z = inv(U)*y;for uplo = 'L': C = LH*A*L, z = inv(LH)*y.If itype = 3, the generalized eigenproblem is B*A*z =lambda*zfor uplo = 'U': C = U*A*UH, z = UH*y;for uplo = 'L': C = LH*A*L, z = L*y.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', the array a stores the upper triangle of A;you must supply B in the factored form B = UH*U.If uplo = 'L', the array a stores the lower triangle of A;you must supply B in the factored form B = L*LH.


COMPLEX for chegstDOUBLE COMPLEX for zhegst.a, bArrays:a(lda,*) contains the upper or lower triangle of A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the Cholesky-factored matrix B:B = UH*U or B = L*LH (as returned by ?potrf).The second dimension of b must be at least max(1, n).



760


Output Parameters


a




Specific details for the routine hegst interface are the following:





Application Notes

Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplicationby inv(B) (if itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in thecomputation of eigenvalues and eigenvectors of the original problem, there may be a significantloss of accuracy if B is ill-conditioned with respect to inversion.


761


?spgstReduces a real symmetric-definite generalizedeigenvalue problem to the standard form usingpacked storage.

Syntax

Fortran 77:

call sspgst(itype, uplo, n, ap, bp, info)

call dspgst(itype, uplo, n, ap, bp, info)

Fortran 95:

call spgst(a, b [,itype] [,uplo] [,info])

Description



to the standard form Cy = λy, using packed matrix storage. Here A is a real symmetric matrix,and B is a real symmetric positive-definite matrix. Before calling this routine, call ?pptrf tocompute the Cholesky factorization: B = UT*U or B = L*LT.

Input Parameters

INTEGER. Must be 1 or 2 or 3.itypeIf itype = 1, the generalized eigenproblem is A*z =lambda*B*zfor uplo = 'U': C = inv(UT)*A*inv(U), z = inv(U)*y;for uplo = 'L': C = inv(L)*A*inv(LT), z = inv(LT)*y.If itype = 2, the generalized eigenproblem is A*B*z =lambda*zfor uplo = 'U': C = U*A*UT, z = inv(U)*y;for uplo = 'L': C = LT*A*L, z = inv(LT)*y.If itype = 3, the generalized eigenproblem is B*A*z =lambda*zfor uplo = 'U': C = U*A*UT, z = UT*y;for uplo = 'L': C = LT*A*L, z = L*y.

762


CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangle of A;you must supply B in the factored form B = UT*U.If uplo = 'L', ap stores the packed lower triangle of A;you must supply B in the factored form B = L*LT.


REAL for sspgstap, bpDOUBLE PRECISION for dspgst.Arrays:ap(*) contains the packed upper or lower triangle of A.The dimension of ap must be at least max(1, n*(n+1)/2).

p(*) contains the packed Cholesky factor of B (as returnedby ?pptrf with the same uplo value).

b

The dimension of bp must be at least max(1, n*(n+1)/2).

Output Parameters


ap




Specific details for the routine spgst interface are the following:


a

Stands for argument bp in Fortan 77 interface. Holds the array B of size(n*(n+1)/2).

b



763


Application Notes

Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplicationby B-1 (if itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in thecomputation of eigenvalues and eigenvectors of the original problem, there may be a significantloss of accuracy if B is ill-conditioned with respect to inversion.


?hpgstReduces a complex Hermitian-definite generalizedeigenvalue problem to the standard form usingpacked storage.

Syntax

Fortran 77:

call chpgst(itype, uplo, n, ap, bp, info)

call zhpgst(itype, uplo, n, ap, bp, info)

Fortran 95:

call hpgst(a, b [,itype] [,uplo] [,info])

Description



to the standard form Cy = λyl, using packed matrix storage. Here A is a real symmetric matrix,and B is a real symmetric positive-definite matrix. Before calling this routine, you must call?pptrf to compute the Cholesky factorization: B = UH*U or B = L*LH.

Input Parameters

INTEGER. Must be 1 or 2 or 3.itypeIf itype = 1, the generalized eigenproblem is A*z =lambda*B*zfor uplo = 'U': C = inv(UH)*A*inv(U), z = inv(U)*y;for uplo = 'L': C = inv(L)*A*inv(LH), z = inv(LH)*y.

764


If itype = 2, the generalized eigenproblem is A*B*z =lambda*zfor uplo = 'U': C = U*A*UH, z = inv(U)*y;for uplo = 'L': C = LH*A*L, z = inv(LH)*y.If itype = 3, the generalized eigenproblem is B*A*z =lambda*zfor uplo = 'U': C = U*A*UH, z = UH*y;for uplo = 'L': C = LH*A*L, z = L*y.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangle of A;you must supply B in the factored form B = UH*U.If uplo = 'L', ap stores the packed lower triangle of A;you must supply B in the factored form B = L*LH.


COMPLEX for chpgstDOUBLE COMPLEX for zhpgst.ap, bpArrays:ap(*) contains the packed upper or lower triangle of A.The dimension of a must be at least max(1, n*(n+1)/2).bp(*) contains the packed Cholesky factor of B (as returnedby ?pptrf with the same uplo value).The dimension of b must be at least max(1, n*(n+1)/2).

Output Parameters


ap




Specific details for the routine hpgst interface are the following:

765



a


b



Application Notes

Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplicationby inv(B) (if itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in thecomputation of eigenvalues and eigenvectors of the original problem, there may be a significantloss of accuracy if B is ill-conditioned with respect to inversion.


?sbgstReduces a real symmetric-definite generalizedeigenproblem for banded matrices to the standardform using the factorization performed by ?pbstf.

Syntax

Fortran 77:

call ssbgst(vect, uplo, n, ka, kb, ab, ldab, bb, ldbb, x, ldx, work, info)

call dsbgst(vect, uplo, n, ka, kb, ab, ldab, bb, ldbb, x, ldx, work, info)

Fortran 95:

call sbgst(a, b [,x] [,uplo] [,info])

Description

To reduce the real symmetric-definite generalized eigenproblem Az = λBz to the standard

form Cy = λy, where A, B and C are banded, this routine must be preceded by a call tospbstf/dpbstf, which computes the split Cholesky factorization of the positive-definite matrixB: B = ST*S. The split Cholesky factorization, compared with the ordinary Choleskyfactorization, allows the work to be approximately halved.

766


This routine overwrites A with C = XT*A*X, where X = inv(S)*Q and Q is an orthogonal matrixchosen (implicitly) to preserve the bandwidth of A. The routine also has an option to allow theaccumulation of X, and then, if z is an eigenvector of C, Xz is an eigenvector of the originalsystem.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.vectIf vect = 'N', then matrix X is not returned;If vect = 'V', then matrix X is returned.



INTEGER. The number of super- or sub-diagonals in Aka

(ka ≥ 0).

INTEGER. The number of super- or sub-diagonals in Bkb

(ka ≥ kb ≥ 0).

REAL for ssbgstab, bb, workDOUBLE PRECISION for dsbgstab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb (ldbb,*) is an array containing the banded split Choleskyfactor of B as specified by uplo, n and kb and returned byspbstf/dpbstf.The second dimension of the array bb must be at leastmax(1, n).work(*) is a workspace array, dimension at least max(1,2*n)

INTEGER. The first dimension of the array ab; must be atleast ka+1.

ldab

INTEGER. The first dimension of the array bb; must be atleast kb+1.

ldbb

767


The first dimension of the output array x. Constraints: if

vect = 'N', then ldx ≥ 1;

ldx

if vect = 'V', then ldx ≥ max(1, n).

Output Parameters

On exit, this array is overwritten by the upper or lowertriangle of C as specified by uplo.

ab

REAL for ssbgstxDOUBLE PRECISION for dsbgstArray.If vect = 'V', then x (ldx,*) contains the n-by-n matrixX = inv(S)*Q.If vect = 'N', then x is not referenced.The second dimension of x must be:at least max(1, n), if vect = 'V';at least 1, if vect = 'N'.




Specific details for the routine sbgst interface are the following:

Stands for argument ab in Fortan 77 interface. Holds the array A of size(ka+1,n).

a

Stands for argument bb in Fortan 77 interface. Holds the array B of size(kb+1,n).

b

Holds the matrix X of size (n,n).x


Restored based on the presence of the argument x as follows:vectvect = 'V', if x is present,vect = 'N', if x is omitted.

768


Application Notes

Forming the reduced matrix C involves implicit multiplication by B-1. When the routine is usedas a step in the computation of eigenvalues and eigenvectors of the original problem, theremay be a significant loss of accuracy if B is ill-conditioned with respect to inversion. The totalnumber of floating-point operations is approximately 6n2*kb, when vect = 'N'. Additional(3/2)n3*(kb/ka) operations are required when vect = 'V'. All these estimates assume thatboth ka and kb are much less than n.

?hbgstReduces a complex Hermitian-definite generalizedeigenproblem for banded matrices to the standardform using the factorization performed by ?pbstf.

Syntax

Fortran 77:

call chbgst(vect, uplo, n, ka, kb, ab, ldab, bb, ldbb, x, ldx, work, rwork,info)

call zhbgst(vect, uplo, n, ka, kb, ab, ldab, bb, ldbb, x, ldx, work, rwork,info)

Fortran 95:

call hbgst(a, b [,x] [,uplo] [,info])

Description

To reduce the complex Hermitian-definite generalized eigenproblem Az = λBz to the standard

form Cy = λy, where A, B and C are banded, this routine must be preceded by a call tocpbstf/zpbstf, which computes the split Cholesky factorization of the positive-definite matrixB: B = SH*S. The split Cholesky factorization, compared with the ordinary Choleskyfactorization, allows the work to be approximately halved.

This routine overwrites A with C = XH*A*X, where X = inv(S)*Q, and Q is a unitary matrixchosen (implicitly) to preserve the bandwidth of A. The routine also has an option to allow theaccumulation of X, and then, if z is an eigenvector of C, Xz is an eigenvector of the originalsystem.

769


Input Parameters

CHARACTER*1. Must be 'N' or 'V'.vectIf vect = 'N', then matrix X is not returned;If vect = 'V', then matrix X is returned.




(ka ≥ 0).


(ka ≥ kb ≥ 0).

COMPLEX for chbgstDOUBLE COMPLEX for zhbgstab, bb, workab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb (ldbb,*) is an array containing the banded split Choleskyfactor of B as specified by uplo, n and kb and returned bycpbstf/zpbstf.The second dimension of the array bb must be at leastmax(1, n).work(*) is a workspace array, dimension at least max(1,n)


ldab


ldbb

The first dimension of the output array x. Constraints:ldx

if vect = 'N', then ldx ≥ 1;

if vect = 'V', then ldx ≥ max(1, n).

REAL for chbgstrworkDOUBLE PRECISION for zhbgst

770


Workspace array, dimension at least max(1, n)

Output Parameters

On exit, this array is overwritten by the upper or lowertriangle of C as specified by uplo.

ab

COMPLEX for chbgstxDOUBLE COMPLEX for zhbgstArray.If vect = 'V', then x (ldx,*) contains the n-by-n matrixX = inv(S)*Q.If vect = 'N', then x is not referenced.The second dimension of x must be:at least max(1, n), if vect = 'V';at least 1, if vect = 'N'.




Specific details for the routine hbgst interface are the following:


a


b

Holds the matrix X of size (n,n).x


Restored based on the presence of the argument x as follows: vect= 'V', if x is present, vect = 'N', if x is omitted.

vect

771


Application Notes

Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine isused as a step in the computation of eigenvalues and eigenvectors of the original problem,there may be a significant loss of accuracy if B is ill-conditioned with respect to inversion. Thetotal number of floating-point operations is approximately 20n2*kb, when vect = 'N'.Additional 5n3*(kb/ka) operations are required when vect = 'V'. All these estimates assumethat both ka and kb are much less than n.

?pbstfComputes a split Cholesky factorization of a realsymmetric or complex Hermitian positive-definitebanded matrix used in ?sbgst/?hbgst .

Syntax

Fortran 77:

call spbstf(uplo, n, kb, bb, ldbb, info)

call dpbstf(uplo, n, kb, bb, ldbb, info)

call cpbstf(uplo, n, kb, bb, ldbb, info)

call zpbstf(uplo, n, kb, bb, ldbb, info)

Fortran 95:

call pbstf(b [, uplo] [,info])

Description

This routine computes a split Cholesky factorization of a real symmetric or complex Hermitianpositive-definite band matrix B. It is to be used in conjunction with ?sbgst/?hbgst.

The factorization has the form B = ST*S (or B = SH*S for complex flavors), where S is a bandmatrix of the same bandwidth as B and the following structure: S is upper triangular in the first(n+kb)/2 rows and lower triangular in the remaining rows.

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', bb stores the upper triangular part of B.If uplo = 'L', bb stores the lower triangular part of B.

772




(kb ≥ 0).

REAL for spbstfbbDOUBLE PRECISION for dpbstfCOMPLEX for cpbstfDOUBLE COMPLEX for zpbstf.bb (ldbb,*) is an array containing either upper or lowertriangular part of the matrix B (as specified by uplo) in bandstorage format.The second dimension of the array bb must be at leastmax(1, n).

INTEGER. The first dimension of bb; must be at least kb+1.ldbb

Output Parameters

On exit, this array is overwritten by the elements of thesplit Cholesky factor S.

bb

INTEGER.infoIf info = 0, the execution is successful.If info = i, then the factorization could not be completed,because the updated element bii would be the square rootof a negative number; hence the matrix B is notpositive-definite.If info = -i, the ith parameter had an illegal value.



Specific details for the routine pbstf interface are the following:


b


773


Application Notes

The computed factor S is the exact factor of a perturbed matrix B + E, where


The total number of floating-point operations for real flavors is approximately n(kb+1)2. Thenumber of operations for complex flavors is 4 times greater. All these estimates assume thatkb is much less than n.

After calling this routine, you can call ?sbgst/?hbgst to solve the generalized eigenproblem

Az = λBz, where A and B are banded and B is positive-definite.


This section describes LAPACK routines for solving nonsymmetric eigenvalue problems,computing the Schur factorization of general matrices, as well as performing a number of relatedcomputational tasks.

A nonsymmetric eigenvalue problem is as follows: given a nonsymmetric (or non-Hermitian)

matrix A, find the eigenvalues λ and the corresponding eigenvectors z that satisfy theequation

Az = λz (right eigenvectors z)

or the equation

zHA = λzH (left eigenvectors z).

Nonsymmetric eigenvalue problems have the following properties:

• The number of eigenvectors may be less than the matrix order (but is not less than thenumber of distinct eigenvalues of A).

• Eigenvalues may be complex even for a real matrix A.

• If a real nonsymmetric matrix has a complex eigenvalue a+bi corresponding to an eigenvectorz, then a-bi is also an eigenvalue. The eigenvalue a-bi corresponds to the eigenvectorwhose elements are complex conjugate to the elements of z.

774


To solve a nonsymmetric eigenvalue problem with LAPACK, you usually need to reduce thematrix to the upper Hessenberg form and then solve the eigenvalue problem with the Hessenbergmatrix obtained. Table 4-5 lists LAPACK routines (Fortran-77 interface) for reducing the matrixto the upper Hessenberg form by an orthogonal (or unitary) similarity transformation A = QHQH

as well as routines for solving eigenvalue problems with Hessenberg matrices, forming theSchur factorization of such matrices and computing the corresponding condition numbers.Respective routine names in Fortran-95 interface are without the first symbol (see RoutineNaming Conventions).

Decision tree in Figure 4-4 helps you choose the right routine or sequence of routines for aneigenvalue problem with a real nonsymmetric matrix. If you need to solve an eigenvalue problemwith a complex non-Hermitian matrix, use the decision tree shown in Figure 4-5 .

Table 4-5 Computational Routines for Solving Nonsymmetric Eigenvalue Problems

Routines for complexmatrices

Routines for real matricesOperation performed

?gehrd?gehrd,Reduce to Hessenbergform A = QHQH

?unghr?orghrGenerate the matrix Q

?unmhr?ormhrApply the matrix Q

?gebal?gebalBalance matrix

?gebak?gebakTransform eigenvectors ofbalanced matrix to thoseof the original matrix

?hseqr?hseqrFind eigenvalues andSchur factorization (QRalgorithm)

?hsein?hseinFind eigenvectors fromHessenberg form (inverseiteration)

?trevc?trevcFind eigenvectors fromSchur factorization

?trsna?trsnaEstimate sensitivities ofeigenvalues andeigenvectors

775


Routines for complexmatrices

Routines for real matricesOperation performed

?trexc?trexcReorder Schurfactorization

?trsen?trsenReorder Schurfactorization, find theinvariant subspace andestimate sensitivities

?trsyl?trsylSolves Sylvester'sequation.

776


Figure 4-4 Decision Tree: Real Nonsymmetric Eigenvalue Problems

777


Figure 4-5 Decision Tree: Complex Non-Hermitian Eigenvalue Problems

778


?gehrdReduces a general matrix to upper Hessenbergform.

Syntax

Fortran 77:

call sgehrd(n, ilo, ihi, a, lda, tau, work, lwork, info)

call dgehrd(n, ilo, ihi, a, lda, tau, work, lwork, info)

call cgehrd(n, ilo, ihi, a, lda, tau, work, lwork, info)

call zgehrd(n, ilo, ihi, a, lda, tau, work, lwork, info)

Fortran 95:

call gehrd(a [, tau] [,ilo] [,ihi] [,info])

Description

The routine reduces a general matrix A to upper Hessenberg form H by an orthogonal or unitarysimilarity transformation A = Q*H*QH. Here H has real subdiagonal elements.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product ofelementary reflectors. Routines are provided to work with Q in this representation.

Input Parameters


INTEGER. If A has been output by ?gebal, then ilo andihi must contain the values returned by that routine.

Otherwise ilo = 1 and ihi = n. (If n > 0, then 1 ≤ ilo

≤ ihi ≤ n; if n = 0, ilo = 1 and ihi = 0.)

ilo, ihi

REAL for sgehrda, workDOUBLE PRECISION for dgehrdCOMPLEX for cgehrdDOUBLE COMPLEX for zgehrd.Arrays:a (lda,*) contains the matrix A.The second dimension of a must be at least max(1, n).

779


work (lwork) is a workspace array.



Output Parameters

Overwritten by the upper Hessenberg matrix H and detailsof the matrix Q. The subdiagonal elements of H are real.

a

REAL for sgehrdtauDOUBLE PRECISION for dgehrdCOMPLEX for cgehrdDOUBLE COMPLEX for zgehrd.Array, DIMENSION at least max (1, n-1).Contains additional information on the matrix Q.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i , the i-th parameter had an illegal value.



Specific details for the routine gehrd interface are the following:



Default value for this argument is ilo = 1.ilo

Default value for this argument is ihi = n.ihi

780


Application Notes






The computed Hessenberg matrix H is exactly similar to a nearby matrix A + E, where ||E||2

< c(n)ε||A||2, c(n) is a modestly increasing function of n, and ε is the machine precision.

The approximate number of floating-point operations for real flavors is (2/3)(ihi -ilo)2(2ihi + 2ilo + 3n); for complex flavors it is 4 times greater.

?orghrGenerates the real orthogonal matrix Q determinedby ?gehrd.

Syntax

Fortran 77:

call sorghr(n, ilo, ihi, a, lda, tau, work, lwork, info)

call dorghr(n, ilo, ihi, a, lda, tau, work, lwork, info)

Fortran 95:

call orghr(a, tau [,ilo] [,ihi] [,info])

781

Description

This routine explicitly generates the orthogonal matrix Q that has been determined by a precedingcall to sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenbergform H by an orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q

as a product of ihi-ilo elementary reflectors. Here ilo and ihi are values determinedby sgebal/dgebal when balancing the matrix;if the matrix has not been balanced, ilo = 1and ihi = n.)

The matrix Q generated by ?orghr has the structure:

where Q22 occupies rows and columns ilo to ihi.

Input Parameters


INTEGER. These must be the same parameters ilo and ihi,

respectively, as supplied to ?gehrd. (If n > 0, then 1 ≤

ilo ≤ ihi ≤ n; if n = 0, ilo = 1 and ihi = 0.)

ilo, ihi

REAL for sorghra, tau, workDOUBLE PRECISION for dorghrArrays: a(lda,*) contains details of the vectors which definethe elementary reflectors, as returned by ?gehrd.The second dimension of a must be at least max(1, n).tau(*) contains further details of the elementary reflectors,as returned by ?gehrd.The dimension of tau must be at least max (1, n-1).work is a workspace array, its dimension max(1, lwork).



782


lwork ≥ max(1, ihi-ilo).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the n-by-n orthogonal matrix Q.a


work(1)




Specific details for the routine orghr interface are the following:





Application Notes

For better performance, try using lwork =(ihi-ilo)*blocksize where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.


783





The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε),

where ε is the machine precision.

The approximate number of floating-point operations is (4/3)(ihi-ilo)3.

The complex counterpart of this routine is ?unghr.

?ormhrMultiplies an arbitrary real matrix C by the realorthogonal matrix Q determined by ?gehrd.

Syntax

Fortran 77:

call sormhr(side, trans, m, n, ilo, ihi, a, lda, tau, c, ldc, work, lwork,info)

call dormhr(side, trans, m, n, ilo, ihi, a, lda, tau, c, ldc, work, lwork,info)

Fortran 95:

call ormhr(a, tau, c [,ilo] [,ihi] [,side] [,trans] [,info])

Description

This routine multiplies a matrix C by the orthogonal matrix Q that has been determined by apreceding call to sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upperHessenberg form H by an orthogonal similarity transformation, A = Q*H*QT, and represents

784


the matrix Q as a product of ihi-ilo elementary reflectors. Here ilo and ihi are valuesdetermined by sgebal/dgebal when balancing the matrix;if the matrix has not been balanced,ilo = 1 and ihi = n.)

With ?ormhr, you can form one of the matrix products Q*C, Q T*C, C*Q, or C*Q T, overwritingthe result on C (which may be any real rectangular matrix).

A common application of ?ormhr is to transform a matrix V of eigenvectors of H to the matrixQV of eigenvectors of A.

Input Parameters

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L', then the routine forms Q*C or QT*C.If side = 'R', then the routine forms C*Q or C*QT.

CHARACTER*1. Must be 'N' or 'T'.transIf trans = 'N' , then Q is applied to C.If trans = 'T', then QT is applied to C.

INTEGER. The number of rows in C (m ≥ 0).m


INTEGER. These must be the same parameters ilo and ihi,respectively, as supplied to ?gehrd.

ilo, ihi

If m > 0 and side = 'L', then 1 ≤ ilo ≤ ihi ≤ m.If m = 0 and side = 'L' , then ilo = 1 and ihi = 0.

If n > 0 and side = 'R', then 1 ≤ ilo ≤ ihi ≤ n.If n = 0 and side = 'R' , then ilo = 1 and ihi = 0.

REAL for sormhra, tau, c, workDOUBLE PRECISION for dormhrArrays:a(lda,*) contains details of the vectors which define theelementary reflectors, as returned by ?gehrd.The second dimension of a must be at least max(1, m) ifside = 'L' and at least max(1, n) if side = 'R'.tau(*) contains further details of the elementaryreflectors, as returned by ?gehrd .The dimension of tau must be at least max (1, m-1)if side = 'L' and at least max (1, n-1) if side = 'R'.

785


c(ldc,*) contains the m by n matrix C. The second dimensionof c must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).

INTEGER. The first dimension of a; at least max(1, m) ifside = 'L' and at least max (1, n) if side = 'R'.

lda

INTEGER. The first dimension of c; at least max(1, m).ldc


If side = 'L', lwork ≥ max(1, n).

If side = 'R', lwork ≥ max(1, m).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

C is overwritten by Q*C or QT*C or C*QT or C*Q as specifiedby side and trans.

c


work(1)




Specific details for the routine ormhr interface are the following:



786







Application Notes

For better performance, lwork should be at least n*blocksize if side = 'L' and at leastm*blocksize if side = 'R', where blocksize is a machine-dependent value (typically, 16 to64) required for optimum performance of the blocked algorithm.





The computed matrix Q differs from the exact result by a matrix E such that ||E||2 =

O(ε)||C||2, where ε is the machine precision.

The approximate number of floating-point operations is

2n(ihi-ilo)2 if side = 'L';

2m(ihi-ilo)2 if side = 'R'.

The complex counterpart of this routine is ?unmhr.

787


?unghrGenerates the complex unitary matrix Qdetermined by ?gehrd.

Syntax

Fortran 77:

call cunghr(n, ilo, ihi, a, lda, tau, work, lwork, info)

call zunghr(n, ilo, ihi, a, lda, tau, work, lwork, info)

Fortran 95:

call unghr(a, tau [,ilo] [,ihi] [,info])

Description

This routine is intended to be used following a call to cgehrd/zgehrd, which reduces a complexmatrix A to upper Hessenberg form H by a unitary similarity transformation: A = Q*H*QH.?gehrd represents the matrix Q as a product of ihi-ilo elementary reflectors. Here iloand ihi are values determined by cgebal/zgebal when balancing the matrix; if the matrixhas not been balanced, ilo = 1 and ihi = n.

Use the routine ?unghr to generate Q explicitly as a square matrix. The matrix Q has thestructure:

where Q22 occupies rows and columns ilo to ihi.

Input Parameters


788


INTEGER. These must be the same parameters ilo and ihi,

respectively, as supplied to ?gehrd . (If n > 0, then 1 ≤

ilo ≤ ihi ≤ n. If n = 0, then ilo = 1 and ihi = 0.)

ilo, ihi

COMPLEX for cunghra, tau, workDOUBLE COMPLEX for zunghr.Arrays:a(lda,*) contains details of the vectors which define theelementary reflectors, as returned by ?gehrd.The second dimension of a must be at least max(1, n).tau(*) contains further details of the elementaryreflectors, as returned by ?gehrd .The dimension of tau must be at least max (1, n-1).work is a workspace array, its dimension max(1, lwork).



lwork ≥ max(1, ihi-ilo).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

Overwritten by the n-by-n unitary matrix Q.a


work(1)




789


Specific details for the routine unghr interface are the following:





Application Notes

For better performance, try using lwork = (ihi-ilo)*blocksize, where blocksize is amachine-dependent value (typically, 16 to 64) required for optimum performance of the blockedalgorithm.





The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε),


The approximate number of real floating-point operations is (16/3)(ihi-ilo)3.

The real counterpart of this routine is ?orghr.

790


?unmhrMultiplies an arbitrary complex matrix C by thecomplex unitary matrix Q determined by ?gehrd.

Syntax

Fortran 77:

call cunmhr(side, trans, m, n, ilo, ihi, a, lda, tau, c, ldc, work, lwork,info)

call zunmhr(side, trans, m, n, ilo, ihi, a, lda, tau, c, ldc, work, lwork,info)

Fortran 95:

call unmhr(a, tau, c [,ilo] [,ihi] [,side] [,trans] [,info])

Description

This routine multiplies a matrix C by the unitary matrix Q that has been determined by apreceding call to cgehrd/zgehrd. (The routine ?gehrd reduces a real general matrix A to upperHessenberg form H by an orthogonal similarity transformation, A = Q*H*QH, and representsthe matrix Q as a product of ihi-iloelementary reflectors. Here ilo and ihi are valuesdetermined by cgebal/zgebal when balancing the matrix; if the matrix has not been balanced,ilo = 1 and ihi = n.)

With ?unmhr, you can form one of the matrix products Q*C, QH*C, C*Q, or C*QH, overwriting theresult on C (which may be any complex rectangular matrix). A common application of thisroutine is to transform a matrix V of eigenvectors of H to the matrix QV of eigenvectors of A.

Input Parameters

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L', then the routine forms Q*C or QH*C.If side = 'R', then the routine forms C*Q or C*QH.

CHARACTER*1. Must be 'N' or 'C'.transIf trans = 'N' , then Q is applied to C.If trans = 'T', then QH is applied to C.

INTEGER. The number of rows in C (m≥ 0).m

791


INTEGER. The number of columns in C (n≥ 0).n

INTEGER. These must be the same parameters ilo and ihi,respectively, as supplied to ?gehrd .

ilo, ihi

If m > 0 and side = 'L', then 1 ≤ilo≤ihi≤m.If m = 0 and side = 'L' , then ilo = 1 and ihi = 0.

If n > 0 and side = 'R', then 1 ≤ilo≤ihi≤n.If n = 0 and side = 'R' , then ilo =1 and ihi = 0.

COMPLEX for cunmhra, tau, c, workDOUBLE COMPLEX for zunmhr.Arrays:a (lda,*) contains details of the vectors which define theelementary reflectors, as returned by ?gehrd.The second dimension of a must be at least max(1, m) ifside = 'L' and at least max(1, n) if side = 'R'.tau(*) contains further details of the elementary reflectors,as returned by ?gehrd.The dimension of tau must be at least max (1, m-1)if side = 'L' and at least max (1, n-1) if side = 'R'.c (ldc,*) contains the m-by-n matrix C. The seconddimension of c must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of a; at least max(1, m) ifside = 'L' and at least max (1, n) if side = 'R'.

lda



If side = 'L', lwork≥ max(1,n).

If side = 'R', lwork≥ max(1,m).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

792


Output Parameters

C is overwritten by Q*C, or QH*C, or C*QH, or C*Q as specifiedby side and trans.

c


work(1)




Specific details for the routine unmhr interface are the following:








Application Notes

For better performance, lwork should be at least n*blocksize if side = 'L' and at leastm*blocksize if side = 'R', where blocksize is a machine-dependent value (typically, 16 to64) required for optimum performance of the blocked algorithm.


793





The computed matrix Q differs from the exact result by a matrix E such that ||E||2 =

O(ε)||C||2, where ε is the machine precision.


8n(ihi-ilo)2 if side = 'L';

8m(ihi-ilo)2 if side = 'R'.

The real counterpart of this routine is ?ormhr.

?gebalBalances a general matrix to improve the accuracyof computed eigenvalues and eigenvectors.

Syntax

Fortran 77:

call sgebal(job, n, a, lda, ilo, ihi, scale, info)

call dgebal(job, n, a, lda, ilo, ihi, scale, info)

call cgebal(job, n, a, lda, ilo, ihi, scale, info)

call zgebal(job, n, a, lda, ilo, ihi, scale, info)

Fortran 95:

call gebal(a [, scale] [,ilo] [,ihi] [,job] [,info])

794


Description

This routine balances a matrix A by performing either or both of the following two similaritytransformations:

(1) The routine first attempts to permute A to block upper triangular form:

where P is a permutation matrix, and A'11 and A'33 are upper triangular. The diagonal elementsof A'11 and A'33 are eigenvalues of A. The rest of the eigenvalues of A are the eigenvalues ofthe central diagonal block A'22, in rows and columns ilo to ihi. Subsequent operations tocompute the eigenvalues of A (or its Schur factorization) need only be applied to these rowsand columns; this can save a significant amount of work if ilo > 1 and ihi < n.

If no suitable permutation exists (as is often the case), the routine sets ilo = 1 and ihi =n, and A'22 is the whole of A.

(2) The routine applies a diagonal similarity transformation to A', to make the rows and columnsof A'22 as close in norm as possible:

This scaling can reduce the norm of the matrix (that is, ||A'2'2|| < ||A'22||), and hencereduce the effect of rounding errors on the accuracy of computed eigenvalues and eigenvectors.

Input Parameters

CHARACTER*1. Must be 'N' or 'P' or 'S' or 'B'.job

795

If job = 'N', then A is neither permuted nor scaled (butilo, ihi, and scale get their values).If job = 'P', then A is permuted but not scaled.If job = 'S', then A is scaled but not permuted.If job = 'B', then A is both scaled and permuted.


REAL for sgebalaDOUBLE PRECISION for dgebalCOMPLEX for cgebalDOUBLE COMPLEX for zgebal.Arrays:a (lda,*) contains the matrix A.The second dimension of a must be at least max(1, n). a isnot referenced if job = 'N'.


Output Parameters

Overwritten by the balanced matrix (a is not referenced ifjob = 'N').

a

INTEGER. The values ilo and ihi such that on exit a(i,j)

is zero if i > j and 1 ≤ j < ilo or ihi < i ≤ n.

ilo, ihi

If job = 'N' or 'S', then ilo = 1 and ihi = n.

REAL for single-precision flavors DOUBLE PRECISION fordouble-precision flavors

scale

Array, DIMENSION at least max(1, n).Contains details of the permutations and scaling factors.More precisely, if pj is the index of the row and columninterchanged with row and column j, and dj is the scalingfactor used to balance row and column j, thenscale(j) = pj for j = 1, 2,..., ilo-1, ihi+1,...,n;scale(j) = dj for j = ilo, ilo + 1,..., ihi.The order in which the interchanges are made is n to ihi+1,then 1 to ilo-1.

INTEGER.info

796

If info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine gebal interface are the following:


Holds the vector of length (n).scale



Must be 'B', 'S', 'P', or 'N'. The default value is 'B'.job

Application Notes

The errors are negligible, compared with those in subsequent computations.

If the matrix A is balanced by this routine, then any eigenvectors computed subsequently areeigenvectors of the matrix A'' and hence you must call ?gebak to transform them back toeigenvectors of A.

If the Schur vectors of A are required, do not call this routine with job = 'S' or 'B', becausethen the balancing transformation is not orthogonal (not unitary for complex flavors).

If you call this routine with job = 'P', then any Schur vectors computed subsequently areSchur vectors of the matrix A'', and you need to call ?gebak (with side = 'R') to transformthem back to Schur vectors of A.

The total number of floating-point operations is proportional to n2.

797


?gebakTransforms eigenvectors of a balanced matrix tothose of the original nonsymmetric matrix.

Syntax

Fortran 77:

call sgebak(job, side, n, ilo, ihi, scale, m, v, ldv, info)

call dgebak(job, side, n, ilo, ihi, scale, m, v, ldv, info)

call cgebak(job, side, n, ilo, ihi, scale, m, v, ldv, info)

call zgebak(job, side, n, ilo, ihi, scale, m, v, ldv, info)

Fortran 95:

call gebak(v, scale [,ilo] [,ihi] [,job] [,side] [,info])

Description

This routine is intended to be used after a matrix A has been balanced by a call to ?gebal, andeigenvectors of the balanced matrix A''22 have subsequently been computed. For a descriptionof balancing, see ?gebal. The balanced matrix A'' is obtained as A''= D*P*A*PT*inv(D),where P is a permutation matrix and D is a diagonal scaling matrix. This routine transforms theeigenvectors as follows:

if x is a right eigenvector of A'', then PT*inv(D)*x is a right eigenvector of A; if x is a lefteigenvector of A'', then PT*D*y is a left eigenvector of A.

Input Parameters

CHARACTER*1. Must be 'N' or 'P' or 'S' or 'B'. The sameparameter job as supplied to ?gebal.

job

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L' , then left eigenvectors are transformed.If side = 'R', then right eigenvectors are transformed.

INTEGER. The number of rows of the matrix of eigenvectors

(n ≥ 0).

n

INTEGER. The values ilo and ihi, as returned by ?gebal.

(If n > 0, then 1 ≤ ilo ≤ ihi ≤ n;

ilo, ihi

798


if n = 0, then ilo = 1 and ihi = 0.)

REAL for single-precision flavorsscaleDOUBLE PRECISION for double-precision flavorsArray, DIMENSION at least max(1, n).Contains details of the permutations and/or the scalingfactors used to balance the original general matrix, asreturned by ?gebal.

INTEGER. The number of columns of the matrix of

eigenvectors (m ≥ 0).

m

REAL for sgebakvDOUBLE PRECISION for dgebakCOMPLEX for cgebakDOUBLE COMPLEX for zgebak.Arrays:v (ldv,*) contains the matrix of left or right eigenvectorsto be transformed.The second dimension of v must be at least max(1, m).

INTEGER. The first dimension of v; at least max(1, n).ldv

Output Parameters

Overwritten by the transformed eigenvectors.v




Specific details for the routine gebak interface are the following:

Holds the matrix V of size (n,m).v




799




Application Notes

The errors in this routine are negligible.

The approximate number of floating-point operations is approximately proportional to m*n.

?hseqrComputes all eigenvalues and (optionally) theSchur factorization of a matrix reduced toHessenberg form.

Syntax

Fortran 77:

call shseqr(job, compz, n, ilo, ihi, h, ldh, wr, wi, z, ldz, work, lwork,info)

call dhseqr(job, compz, n, ilo, ihi, h, ldh, wr, wi, z, ldz, work, lwork,info)

call chseqr(job, compz, n, ilo, ihi, h, ldh, w, z, ldz, work, lwork, info)

call zhseqr(job, compz, n, ilo, ihi, h, ldh, w, z, ldz, work, lwork, info)

Fortran 95:

call hseqr(h, wr, wi [,ilo] [,ihi] [,z] [,job] [,compz] [,info])

call hseqr(h, w [,ilo] [,ihi] [,z] [,job] [,compz] [,info])

Description

This routine computes all the eigenvalues, and optionally the Schur factorization, of an upperHessenberg matrix H: H = ZTZH, where T is an upper triangular (or, for real flavors,quasi-triangular) matrix (the Schur form of H), and Z is the unitary or orthogonal matrix whosecolumns are the Schur vectors zi.

You can also use this routine to compute the Schur factorization of a general matrix A whichhas been reduced to upper Hessenberg form H:

A = QHQH, where Q is unitary (orthogonal for real flavors);

800


A = (QZ)H(QZ)H.

In this case, after reducing A to Hessenberg form by ?gehrd, call ?orghr to form Q explicitlyand then pass Q to ?hseqr with compz = 'V'.

You can also call ?gebal to balance the original matrix before reducing it to Hessenberg formby ?hseqr, so that the Hessenberg matrix H will have the structure:

where H11 and H33 are upper triangular.

If so, only the central diagonal block H22 (in rows and columns ilo to ihi) needs to be furtherreduced to Schur form (the blocks H12 and H23 are also affected). Therefore the values of iloand ihi can be supplied to ?hseqr directly. Also, after calling this routine you must call ?gebakto permute the Schur vectors of the balanced matrix to those of the original matrix.

If ?gebal has not been called, however, then ilo must be set to 1 and ihi to n. Note that ifthe Schur factorization of A is required, ?gebal must not be called with job = 'S' or 'B',because the balancing transformation is not unitary (for real flavors, it is not orthogonal).

?hseqr uses a multishift form of the upper Hessenberg QR algorithm. The Schur vectors arenormalized so that ||zi||2 = 1, but are determined only to within a complex factor of absolutevalue 1 (for the real flavors, to within a factor ±1).

Input Parameters

CHARACTER*1. Must be 'E' or 'S'.jobIf job = 'E', then eigenvalues only are required.If job = 'S', then the Schur form T is required.

CHARACTER*1. Must be 'N' or 'I' or 'V'.compzIf compz = 'N', then no Schur vectors are computed (andthe array z is not referenced).If compz = 'I', then the Schur vectors of H are computed(and the array z is initialized by the routine).

801


If compz = 'V', then the Schur vectors of A are computed(and the array z must contain the matrix Q on entry).

INTEGER. The order of the matrix H (n ≥ 0).n

INTEGER. If A has been balanced by ?gebal, then ilo andihi must contain the values returned by ?gebal. Otherwise,ilo must be set to 1 and ihi to n.

ilo, ihi

REAL for shseqrh, z, workDOUBLE PRECISION for dhseqrCOMPLEX for chseqrDOUBLE COMPLEX for zhseqr.Arrays:h(ldh,*) The n-by-n upper Hessenberg matrix H.The second dimension of h must be at least max(1, n).z(ldz,*)If compz = 'V', then z must contain the matrix Q from thereduction to Hessenberg form.If compz = 'I', then z need not be set.If compz = 'N', then z is not referenced.The second dimension of z must beat least max(1, n) if compz = 'V' or 'I';at least 1 if compz = 'N'.work(lwork) is a workspace array.The dimension of work must be at least max (1, n).

INTEGER. The first dimension of h; at least max(1, n).ldh

INTEGER. The first dimension of z;ldz

If compz = 'N', then ldz ≥ 1.

If compz = 'V' or 'I', then ldz ≥ max(1, n).


lwork ≥ max(1, n)is sufficient, but lwork typically aslarge as 6*n may be required for optimal performance. Aworkspace query to determine the optimal workspace sizeis recommended.

802


If lwork = -1, then a workspace query is assumed; theroutine only estimates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

Output Parameters

COMPLEX for chseqrwDOUBLE COMPLEX for zhseqr.Array, DIMENSION at least max (1, n). Contains thecomputed eigenvalues, unless info>0. The eigenvalues arestored in the same order as on the diagonal of the Schurform T (if computed).

REAL for shseqrwr, wiDOUBLE PRECISION for dhseqrArrays, DIMENSION at least max (1, n) each.Contain the real and imaginary parts, respectively, of thecomputed eigenvalues, unless info > 0. Complex conjugatepairs of eigenvalues appear consecutively with theeigenvalue having positive imaginary part first. Theeigenvalues are stored in the same order as on the diagonalof the Schur form T (if computed).

If info = 0 and job = 'S', h contains the upper triangularmatrix T from the Schur decomposition (the Schur form).

h

If info = 0 and job = 'E', the contents of h areunspecified on exit. (The output value of h when info >0 is given under the description of info below.)

If compz = 'V' and info = 0, then z contains Q*Z.zIf compz = 'I' and info = 0, then z contains the unitaryor orthogonal matrix Z of the Schur vectors of H.If compz = 'N', then z is not referenced.

On exit, if info = 0, then work(1) returns the optimallwork.

work(1)


803


If info = i, elements 1,2, ..., ilo-1 and i+1, i+2, ..., nof wr and wi contain the real and imaginary parts of thoseeigenvalues that have been succesively found.If info > 0, and job = 'E', then on exit, the remainingunconverged eigenvalues are the eigenvalues of the upperHessenberg matrix rows and columns ilo through info ofthe final output value of H.If info > 0, and job = 'S', then on exit (initial value ofH)*U = U*(final value of H), where U is a unitary matrix. Thefinalvalue of H is upper Hessenberg and triangular in rowsand columns info+1 through ihi.If info > 0, and compz = 'V', then on exit (final valueof Z) = (initial value of Z)*U, where U is the unitary matrix(regardless of the value of job).If info > 0, and compz = 'I', then on exit (final valueof Z) = U, where U is the unitary matrix (regardless of thevalue of job).If info > 0, and compz = 'N', then Z is not accessed.



Specific details for the routine hseqr interface are the following:

Holds the matrix H of size (n,n).h

Holds the vector of length (n). Used in real flavors only.wr

Holds the vector of length (n). Used in real flavors only.wi

Holds the vector of length (n). Used in complex flavors only.w


Must be 'E' or 'S'. The default value is 'E'.job


compz


804


Application Notes

The computed Schur factorization is the exact factorization of a nearby matrix H + E, where

||E||2 < O(ε) ||H||2/si, and ε is the machine precision.

If λi is an exact eigenvalue, and μ is the corresponding computed value, then |λi - μi|≤

c(n)ε||H||2/si where c(n) is a modestly increasing function of n, and si is the reciprocal

condition number of λi. You can compute the condition numbers si by calling ?trsna.

The total number of floating-point operations depends on how rapidly the algorithm converges;typical numbers are as follows.

7n3 for real flavorsIf only eigenvaluesare computed: 25n3 for complex flavors.

10n3 for real flavorsIf the Schur form iscomputed: 35n3 for complex flavors.

20n3 for real flavorsIf the full Schurfactorization iscomputed:

70n3 for complex flavors.





805

?hseinComputes selected eigenvectors of an upperHessenberg matrix that correspond to specifiedeigenvalues.

Syntax

Fortran 77:

call shsein(job, eigsrc, initv, select, n, h, ldh, wr, wi, vl, ldvl, vr, ldvr,mm, m, work, ifaill, ifailr, info)

call dhsein(job, eigsrc, initv, select, n, h, ldh, wr, wi, vl, ldvl, vr, ldvr,mm, m, work, ifaill, ifailr, info)

call chsein(job, eigsrc, initv, select, n, h, ldh, w, vl, ldvl, vr, ldvr, mm,m, work, rwork, ifaill, ifailr, info)

call zhsein(job, eigsrc, initv, select, n, h, ldh, w, vl, ldvl, vr, ldvr, mm,m, work, rwork, ifaill, ifailr, info)

Fortran 95:

call hsein(h, wr, wi, select [, vl] [,vr] [,ifaill] [,ifailr] [,initv][,eigsrc] [,m] [,info])

call hsein(h, w, select [,vl] [,vr] [,ifaill] [,ifailr] [,initv] [,eigsrc][,m] [,info])

Description

This routine computes left and/or right eigenvectors of an upper Hessenberg matrix H,corresponding to selected eigenvalues.

The right eigenvector x and the left eigenvector y, corresponding to an eigenvalue λ, are defined

by: Hx = λx and yHH = λyH (or HHy = λ*y). Here λ* denotes the conjugate of λ.

The eigenvectors are computed by inverse iteration. They are scaled so that, for a realeigenvector x, max|xi| = 1, and for a complex eigenvector, max(|Rexi| + |Imxi|) = 1.

If H has been formed by reduction of a general matrix A to upper Hessenberg form, theneigenvectors of H may be transformed to eigenvectors of A by ?ormhr or ?unmhr.

806


Input Parameters

CHARACTER*1. Must be 'R' or 'L' or 'B'.jobIf job = 'R', then only right eigenvectors are computed.If job = 'L', then only left eigenvectors are computed.If job = 'B', then all eigenvectors are computed.

CHARACTER*1. Must be 'Q' or 'N'.eigsrcIf eigsrc = 'Q', then the eigenvalues of H were foundusing ?hseqr; thus if H has any zero sub-diagonal elements(and so is block triangular), then the j-th eigenvalue canbe assumed to be an eigenvalue of the block containing thej-th row/column. This property allows the routine to performinverse iteration on just one diagonal block. If eigsrc ='N', then no such assumption is made and the routineperforms inverse iteration using the whole matrix.

CHARACTER*1. Must be 'N' or 'U'.initvIf initv = 'N', then no initial estimates for the selectedeigenvectors are supplied.If initv = 'U', then initial estimates for the selectedeigenvectors are supplied in vl and/or vr.

LOGICAL.selectArray, DIMENSION at least max (1, n). Specifies whicheigenvectors are to be computed.For real flavors:To obtain the real eigenvector corresponding to the realeigenvalue wr(j), set select(j) to .TRUE.To select the complex eigenvector corresponding to thecomplex eigenvalue (wr(j), wi(j)) with complex conjugate(wr(j+1), wi(j+1)), set select(j) and/or select(j+1) to.TRUE.; the eigenvector corresponding to the firsteigenvalue in the pair is computed.For complex flavors:To select the eigenvector corresponding to the eigenvaluew(j), set select(j) to .TRUE.


REAL for shseinh, vl, vr,DOUBLE PRECISION for dhsein

807


COMPLEX for chseinDOUBLE COMPLEX for zhsein.Arrays:h(ldh,*) The n-by-n upper Hessenberg matrix H.The second dimension of h must be at least max(1, n).(ldvl,*)If initv = 'V' and job = 'L' or 'B', then vl mustcontain starting vectors for inverse iteration for the lefteigenvectors. Each starting vector must be stored in thesame column or columns as will be used to store thecorresponding eigenvector.If initv = 'N', then vl need not be set.The second dimension of vl must be at least max(1, mm) ifjob = 'L' or 'B' and at least 1 if job = 'R'.The array vl is not referenced if job = 'R'.vr(ldvr,*)If initv = 'V' and job = 'R' or 'B', then vr mustcontain starting vectors for inverse iteration for the righteigenvectors. Each starting vector must be stored in thesame column or columns as will be used to store thecorresponding eigenvector.If initv = 'N', then vr need not be set.The second dimension of vr must be at least max(1, mm) ifjob = 'R' or 'B' and at least 1 if job = 'L'.The array vr is not referenced if job = 'L'.work(*) is a workspace array.DIMENSION at least max (1, n*(n+2)) for real flavors andat least max (1, n*n) for complex flavors.


COMPLEX for chseinwDOUBLE COMPLEX for zhsein.Array, DIMENSION at least max (1, n).Contains the eigenvalues of the matrix H.If eigsrc = 'Q', the array must be exactly as returned by?hseqr.

REAL for shseinwr, wiDOUBLE PRECISION for dhseinArrays, DIMENSION at least max (1, n) each.

808


Contain the real and imaginary parts, respectively, of theeigenvalues of the matrix H. Complex conjugate pairs ofvalues must be stored in consecutive elements of the arrays.If eigsrc = 'Q', the arrays must be exactly as returnedby ?hseqr.

INTEGER. The first dimension of vl.ldvl

If job = 'L' or 'B', ldvl ≥ max(1,n).

If job = 'R', ldvl ≥ 1.

INTEGER. The first dimension of vr.ldvr

If job = 'R' or 'B', ldvr ≥ max(1,n).

If job = 'L', ldvr ≥1.

INTEGER. The number of columns in vl and/or vr.mmMust be at least m, the actual number of columns required(see Output Parameters below).For real flavors, m is obtained by counting 1 for each selectedreal eigenvector and 2 for each selected complex eigenvector(see select).For complex flavors, m is the number of selectedeigenvectors (see select).Constraint:

0 ≤ mm ≤ n.

REAL for chseinrworkDOUBLE PRECISION for zhsein.Array, DIMENSION at least max (1, n).

Output Parameters

Overwritten for real flavors only.selectIf a complex eigenvector was selected as specified above,then select(j) is set to .TRUE. and select(j+1) to.FALSE.

The real parts of some elements of w may be modified, asclose eigenvalues are perturbed slightly in searching forindependent eigenvectors.

w

809


Some elements of wr may be modified, as close eigenvaluesare perturbed slightly in searching for independenteigenvectors.

wr

If job = 'L' or 'B', vl contains the computed lefteigenvectors (as specified by select).

vl, vr

If job = 'R' or 'B', vr contains the computed righteigenvectors (as specified by select).The eigenvectors are stored consecutively in the columnsof the array, in the same order as their eigenvalues.For real flavors: a real eigenvector corresponding to aselected real eigenvalue occupies one column; a complexeigenvector corresponding to a selected complex eigenvalueoccupies two columns: the first column holds the real partand the second column holds the imaginary part.

INTEGER. For real flavors: the number of columns of vland/or vr required to store the selected eigenvectors.

m

For complex flavors: the number of selected eigenvectors.

INTEGER.ifaill, ifailrArrays, DIMENSION at least max(1, mm) each.ifaill(i) = 0 if the ith column of vl converged;ifaill(i) = j > 0 if the eigenvector stored in the i-thcolumn of vl (corresponding to the jth eigenvalue) failedto converge.ifailr(i) = 0 if the ith column of vr converged;ifailr(i) = j > 0 if the eigenvector stored in the i-thcolumn of vr (corresponding to the jth eigenvalue) failedto converge.For real flavors: if the ith and (i+1)th columns of vl containa selected complex eigenvector, then ifaill(i) andifaill(i+1) are set to the same value. A similar rule holdsfor vr and ifailr.The array ifaill is not referenced if job = 'R'. The arrayifailr is not referenced if job = 'L'.


810


If info > 0, then i eigenvectors (as indicated by theparameters ifaill and/or ifailr above) failed to converge.The corresponding columns of vl and/or vr contain no usefulinformation.



Specific details for the routine hsein interface are the following:





Holds the vector of length (n).select

Holds the matrix VL of size (n,mm).vl

Holds the matrix VR of size (n,mm).vr

Holds the vector of length (mm). Note that there will be an error conditionif ifaill is present and vl is omitted.

ifaill

Holds the vector of length (mm). Note that there will be an error conditionif ifailr is present and vr is omitted.

ifailr

Must be 'N' or 'U'. The default value is 'N'.initv

Must be 'N' or 'Q'. The default value is 'N'.eigsrc

Restored based on the presence of arguments vl and vr as follows:jobjob = 'B', if both vl and vr are present,job = 'L', if vl is present and vr omitted,job = 'R', if vl is omitted and vr present,Note that there will be an error condition if both vl and vr are omitted.

Application Notes

Each computed right eigenvector x i is the exact eigenvector of a nearby matrix A + Ei, such

that ||Ei|| < O(ε)||A||. Hence the residual is small:

||Axi - λixi|| = O(ε)||A||.

811

However, eigenvectors corresponding to close or coincident eigenvalues may not accuratelyspan the relevant subspaces.

Similar remarks apply to computed left eigenvectors.

?trevcComputes selected eigenvectors of an upper(quasi-) triangular matrix computed by ?hseqr.

Syntax

Fortran 77:

call strevc(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work,info)

call dtrevc(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work,info)

call ctrevc(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work,rwork, info)

call ztrevc(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work,rwork, info)

Fortran 95:

call trevc(t [, howmny] [,select] [,vl] [,vr] [,m] [,info])

Description

This routine computes some or all of the right and/or left eigenvectors of an upper triangularmatrix T (or, for real flavors, an upper quasi-triangular matrix T). Matrices of this type areproduced by the Schur factorization of a general matrix: A = Q*T*QH, as computed by ?hseqr.

The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w, aredefined by:

T*x = w*x, yH*T = w*yH, where yH denotes the conjugate transpose of y.

The eigenvalues are not input to this routine, but are read directly from the diagonal blocks ofT.

This routine returns the matrices X and/or Y of right and left eigenvectors of T, or the productsQ*X and/or Q*Y, where Q is an input matrix.

812


If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Yare the matrices of right and left eigenvectors of A.

Input Parameters

CHARACTER*1. Must be 'R' or 'L' or 'B'.sideIf side = 'R' , then only right eigenvectors are computed.If side = 'L' , then only left eigenvectors are computed.If side = 'B', then all eigenvectors are computed.

CHARACTER*1. Must be 'A' or 'B' or 'S'.howmnyIf howmny = 'A' , then all eigenvectors (as specified byside) are computed.If howmny = 'B' , then all eigenvectors (as specified byside) are computed and backtransformed by the matricessupplied in vl and vr.If howmny = 'S' , then selected eigenvectors (as specifiedby side and select) are computed.

LOGICAL.selectArray, DIMENSION at least max (1, n).If howmny = 'S', select specifies which eigenvectors areto be computed.If howmny = 'A' or 'B', select is not referenced.For real flavors:If omega(j) is a real eigenvalue, the corresponding realeigenvector is computed if select(j) is .TRUE..If omega(j) and omega(j+1) are the real and imaginary partsof a complex eigenvalue, the corresponding complexeigenvector is computed if either select(j) or select(j+1)is .TRUE., and on exit select(j) is set to .TRUE.andselect(j+1) is set to .FALSE..For complex flavors:The eigenvector corresponding to the j-th eigenvalue iscomputed if select(j) is .TRUE..


REAL for strevct, vl, vr,DOUBLE PRECISION for dtrevcCOMPLEX for ctrevcDOUBLE COMPLEX for ztrevc.

813


Arrays:t(ldt,*) contains the n-by-n matrix T in Schur canonicalform.The second dimension of t must be at least max(1, n).vl(ldvl,*)If howmny = 'B' and side = 'L' or 'B', then vl mustcontain an n-by-n matrix Q (usually the matrix of Schurvectors returned by ?hseqr).If howmny = 'A' or 'S', then vl need not be set.The second dimension of vl must be at least max(1, mm) ifside = 'L' or 'B' and at least 1 if side = 'R'.The array vl is not referenced if side = 'R'.vr (ldvr,*)If howmny = 'B' and side = 'R' or 'B', then vr mustcontain an n-by-n matrix Q (usually the matrix of Schurvectors returned by ?hseqr). .If howmny = 'A' or 'S', then vr need not be set.The second dimension of vr must be at least max(1, mm) ifside = 'R' or 'B' and at least 1 if side = 'L'.The array vr is not referenced if side = 'L'.work(*) is a workspace array.DIMENSION at least max (1, 3*n) for real flavors and atleast max (1, 2*n) for complex flavors.

INTEGER. The first dimension of t; at least max(1, n).ldt


If side = 'L' or 'B', ldvl ≥ n.

If side = 'R', ldvl ≥ 1.


If side = 'R' or 'B', ldvr ≥ n.

If side = 'L', ldvr ≥ 1.

INTEGER. The number of columns in the arrays vl and/orvr. Must be at least m (the precise number of columnsrequired).

mm

If howmny = 'A' or 'B', m = n.

814


If howmny = 'S': for real flavors, m is obtained by counting1 for each selected real eigenvector and 2 for each selectedcomplex eigenvector;for complex flavors, m is the number of selected eigenvectors(see select).

Constraint: 0 ≤ m ≤ n.

REAL for ctrevcrworkDOUBLE PRECISION for ztrevc.Workspace array, DIMENSION at least max (1, n).

Output Parameters

If a complex eigenvector of a real matrix was selected asspecified above, then select(j) is set to .TRUE. andselect(j+1) to .FALSE.

select

If side = 'L' or 'B', vl contains the computed lefteigenvectors (as specified by howmny and select).

vl, vr

If side = 'R' or 'B', vr contains the computed righteigenvectors (as specified by howmny and select).The eigenvectors are stored consecutively in the columnsof the array, in the same order as their eigenvalues.For real flavors: corresponding to each real eigenvalue is areal eigenvector, occupying one column;corresponding toeach complex conjugate pair of eigenvalues is a complexeigenvector, occupying two columns; the first column holdsthe real part and the second column holds the imaginarypart.

INTEGER.mFor complex flavors: the number of selected eigenvectors.If howmny = 'A' or 'B', m is set to n.For real flavors: the number of columns of vl and/or vractually used to store the selected eigenvectors.If howmny = 'A' or 'B', m is set to n.


815




Specific details for the routine trevc interface are the following:

Holds the matrix T of size (n,n).t




If omitted, this argument is restored based on the presence ofarguments vl and vr as follows:

side

side = 'B', if both vl and vr are present,side = 'L', if vr is omitted,side = 'R', if vl is omitted.Note that there will be an error condition if both vl and vr are omitted.

If omitted, this argument is restored based on the presence of argumentselect as follows:

howmny

howmny = 'V', if q is present,howmny = 'N', if q is omitted.If present, vect = 'V' or 'U' and the argument q must also bepresent.Note that there will be an error condition if both select and howmnyare present.

Application Notes

If x i is an exact right eigenvector and yi is the corresponding computed eigenvector, then the

angle θ(yi, xi) between them is bounded as follows: θ(yi,xi)≤(c(n)ε||T||2)/sepi wheresepi is the reciprocal condition number of xi. The condition number sepi may be computed bycalling ?trsna.

816


?trsnaEstimates condition numbers for specifiedeigenvalues and right eigenvectors of an upper(quasi-) triangular matrix.

Syntax

Fortran 77:

call strsna(job, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, s, sep, mm,m, work, ldwork, iwork, info)

call dtrsna(job, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, s, sep, mm,m, work, ldwork, iwork, info)

call ctrsna(job, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, s, sep, mm,m, work, ldwork, rwork, info)

call ztrsna(job, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, s, sep, mm,m, work, ldwork, rwork, info)

Fortran 95:

call trsna(t [, s] [,sep] [,vl] [,vr] [,select] [,m] [,info])

Description

This routine estimates condition numbers for specified eigenvalues and/or right eigenvectorsof an upper triangular matrix T (or, for real flavors, upper quasi-triangular matrix T in canonicalSchur form). These are the same as the condition numbers of the eigenvalues and righteigenvectors of an original matrix A = Z*T*ZH (with unitary or, for real flavors, orthogonal Z),from which T may have been derived.

The routine computes the reciprocal of the condition number of an eigenvalue lambda(i) as si= |vHu|/(||u||E||v||E), where u and v are the right and left eigenvectors of T, respectively,corresponding to lambda(i). This reciprocal condition number always lies between zero(ill-conditioned) and one (well-conditioned).

An approximate error estimate for a computed eigenvalue lambda(i) is then given by ε||T||/si,


817


To estimate the reciprocal of the condition number of the right eigenvector corresponding tolambda(i), the routine first calls ?trexc to reorder the eigenvalues so that lambda(j) is in theleading position:

The reciprocal condition number of the eigenvector is then estimated as sepi, the smallestsingular value of the matrix T22 - lambda(i)*I. This number ranges from zero (ill-conditioned)to very large (well-conditioned).

An approximate error estimate for a computed right eigenvector u corresponding to lambda(i)

is then given by ε||T||/sepi.

Input Parameters

CHARACTER*1. Must be 'E' or 'V' or 'B'.jobIf job = 'E', then condition numbers for eigenvalues onlyare computed.If job = 'V', then condition numbers for eigenvectors onlyare computed.If job = 'B', then condition numbers for both eigenvaluesand eigenvectors are computed.

CHARACTER*1. Must be 'A' or 'S'.howmnyIf howmny = 'A' , then the condition numbers for alleigenpairs are computed.If howmny = 'S' , then condition numbers for selectedeigenpairs (as specified by select) are computed.

LOGICAL.selectArray, DIMENSION at least max (1, n) if howmny = 'S' andat least 1 otherwise.Specifies the eigenpairs for which condition numbers are tobe computed if howmny= 'S'.For real flavors:

818


To select condition numbers for the eigenpair correspondingto the real eigenvalue lambda(j), select(j) must be set.TRUE.;to select condition numbers for the eigenpair correspondingto a complex conjugate pair of eigenvalues lambda(j) andlambda(j+1), select(j) and/or select(j+1) must be set.TRUE.For complex flavorsTo select condition numbers for the eigenpair correspondingto the eigenvalue lambda(j), select(j) must be set .TRUE.select is not referenced if howmny = 'A'.


REAL for strsnat, vl, vr, workDOUBLE PRECISION for dtrsnaCOMPLEX for ctrsnaDOUBLE COMPLEX for ztrsna.Arrays:t(ldt,*) contains the n-by-n matrix T.The second dimension of t must be at least max(1, n).vl(ldvl,*)If job = 'E' or 'B', then vl must contain the lefteigenvectors of T (or of any matrix Q*T*QH with Q unitaryor orthogonal) corresponding to the eigenpairs specified byhowmny and select. The eigenvectors must be stored inconsecutive columns of vl, as returned by ?trevc or?hsein.The second dimension of vl must be at least max(1, mm) ifjob = 'E' or 'B' and at least 1 if job = 'V'.The array vl is not referenced if job = 'V'.vr(ldvr,*)If job = 'E' or 'B', then vr must contain the righteigenvectors of T (or of any matrix Q*T*QH with Q unitaryor orthogonal) corresponding to the eigenpairs specified byhowmny and select. The eigenvectors must be stored inconsecutive columns of vr, as returned by ?trevc or?hsein.

819


The second dimension of vr must be at least max(1, mm) ifjob = 'E' or 'B' and at least 1 if job = 'V'.The array vr is not referenced if job = 'V'.work is a workspace array, its dimension (ldwork,n+6).The array work is not referenced if job = 'E'.



If job = 'E' or 'B', ldvl ≥ max(1,n).

If job = 'V', ldvl ≥ 1.


If job = 'E' or 'B', ldvr ≥ max(1,n).

If job = 'R', ldvr ≥ 1.

INTEGER. The number of elements in the arrays s and sep,and the number of columns in vl and vr (if used). Must beat least m (the precise number required).

mm

If howmny = 'A', m = n;if howmny = 'S', for real flavors m is obtained by counting1 for each selected real eigenvalue and 2 for each selectedcomplex conjugate pair of eigenvalues.for complex flavors m is the number of selected eigenpairs(see select). Constraint:

0 ≤ m ≤ n.

INTEGER. The first dimension of work.ldwork

If job = 'V' or 'B', ldwork ≥ max(1,n).

If job = 'E', ldwork ≥ 1.

REAL for ctrsna, ztrsna.rworkArray, DIMENSION at least max (1, n).

INTEGER for strsna, dtrsna.iworkArray, DIMENSION at least max (1, n).

Output Parameters

REAL for single-precision flavorssDOUBLE PRECISION for double-precision flavors.

820


Array, DIMENSION at least max(1, mm) if job = 'E' or 'B'and at least 1 if job = 'V'.Contains the reciprocal condition numbers of the selectedeigenvalues if job = 'E' or 'B', stored in consecutiveelements of the array. Thus s(j), sep(j) and the j-thcolumns of vl and vr all correspond to the same eigenpair(but not in general the j th eigenpair unless all eigenpairshave been selected).For real flavors: for a complex conjugate pair of eigenvalues,two consecutive elements of S are set to the same value.The array s is not referenced if job = 'V'.

REAL for single-precision flavorssepDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, mm) if job = 'V' or 'B'and at least 1 if job = 'E'. Contains the estimatedreciprocal condition numbers of the selected righteigenvectors if job = 'V' or 'B', stored in consecutiveelements of the array.For real flavors: for a complex eigenvector, two consecutiveelements of sep are set to the same value; if the eigenvaluescannot be reordered to compute sep(j), then sep(j) is setto zero; this can only occur when the true value would bevery small anyway. The array sep is not referenced if job= 'E'.

INTEGER.mFor complex flavors: the number of selected eigenpairs.If howmny = 'A', m is set to n.For real flavors: the number of elements of s and/or sepactually used to store the estimated condition numbers.If howmny = 'A', m is set to n.




821


Specific details for the routine trsna interface are the following:


Holds the vector of length (mm).s

Holds the vector of length (mm).sep




Restored based on the presence of arguments s and sep as follows:jobjob = 'B', if both s and sep are present,job = 'E', if s is present and sep omitted,job = 'V', if s is omitted and sep present.Note an error condition if both s and sep are omitted.

Restored based on the presence of the argument select as follows:howmnyhowmny = 'S', if select is present,howmny = 'A', if select is omitted.

Note that the arguments s, vl, and vr must either be all present or all omitted.

Otherwise, an error condition is observed.

Application Notes

The computed values sepi may overestimate the true value, but seldom by a factor of morethan 3.

?trexcReorders the Schur factorization of a generalmatrix.

Syntax

Fortran 77:

call strexc(compq, n, t, ldt, q, ldq, ifst, ilst, work, info)

call dtrexc(compq, n, t, ldt, q, ldq, ifst, ilst, work, info)

call ctrexc(compq, n, t, ldt, q, ldq, ifst, ilst, info)

call ztrexc(compq, n, t, ldt, q, ldq, ifst, ilst, info)

822


Fortran 95:

call trexc(t, ifst, ilst [,q] [,info])

Description

This routine reorders the Schur factorization of a general matrix A = Q*T*QH, so that thediagonal element or block of T with row index ifst is moved to row ilst.

The reordered Schur form S is computed by an unitary (or, for real flavors, orthogonal) similaritytransformation: S = ZH*T*Z. Optionally the updated matrix P of Schur vectors is computed asP = Q*Z, giving A = P*S*PH.

Input Parameters

CHARACTER*1. Must be 'V' or 'N'.compqIf compq = 'V', then the Schur vectors (Q) are updated.If compq = 'N', then no Schur vectors are updated.


REAL for strexct, qDOUBLE PRECISION for dtrexcCOMPLEX for ctrexcDOUBLE COMPLEX for ztrexc.Arrays:t(ldt,*) contains the n-by-n matrix T.The second dimension of t must be at least max(1, n).q(ldq,*)If compq = 'V', then q must contain Q (Schur vectors).If compq = 'N', then q is not referenced.The second dimension of q must be at least max(1, n)if compq = 'V' and at least 1 if compq = 'N'.


INTEGER. The first dimension of q;ldq

If compq = 'N', then ldq≥ 1.

If compq = 'V', then ldq≥ max(1, n).

INTEGER. 1 ≤ ifst ≤ n; 1 ≤ ilst ≤ n.ifst, ilst

823


Must specify the reordering of the diagonal elements (orblocks, which is possible for real flavors) of the matrix T.The element (or block) with row index ifst is moved to rowilst by a sequence of exchanges between adjacentelements (or blocks).

REAL for strexcworkDOUBLE PRECISION for dtrexc.Array, DIMENSION at least max (1, n).

Output Parameters

Overwritten by the updated matrix S.t

If compq = 'V', q contains the updated matrix of Schurvectors.

q

Overwritten for real flavors only.ifst, ilstIf ifst pointed to the second row of a 2 by 2 block on entry,it is changed to point to the first row; ilst always pointsto the first row of the block in its final position (which maydiffer from its input value by ±1).




Specific details for the routine trexc interface are the following:



Restored based on the presence of the argument q as follows:compqcompq = 'V', if q is present,compq = 'N', if q is omitted.

824


Application Notes

The computed matrix S is exactly similar to a matrix T + E, where ||E||2 = O(ε)||T||2, and

ε is the machine precision.

Note that if a 2 by 2 diagonal block is involved in the re-ordering, its off-diagonal elements arein general changed; the diagonal elements and the eigenvalues of the block are unchangedunless the block is sufficiently ill-conditioned, in which case they may be noticeably altered. Itis possible for a 2 by 2 block to break into two 1 by 1 blocks, that is, for a pair of complexeigenvalues to become purely real.

The values of eigenvalues however are never changed by the re-ordering.


6n(ifst-ilst) if compq = 'N';for real flavors:12n(ifst-ilst) if compq = 'V';

20n(ifst-ilst) if compq = 'N';for complex flavors:40n(ifst-ilst) if compq = 'V'.

?trsenReorders the Schur factorization of a matrix and(optionally) computes the reciprocal conditionnumbers and invariant subspace for the selectedcluster of eigenvalues.

Syntax

Fortran 77:

call strsen(job, compq, select, n, t, ldt, q, ldq, wr, wi, m, s, sep, work,lwork, iwork, liwork, info)

call dtrsen(job, compq, select, n, t, ldt, q, ldq, wr, wi, m, s, sep, work,lwork, iwork, liwork, info)

call ctrsen(job, compq, select, n, t, ldt, q, ldq, w, m, s, sep, work, lwork,info)

call ztrsen(job, compq, select, n, t, ldt, q, ldq, w, m, s, sep, work, lwork,info)

825


Fortran 95:

call trsen(t, select [,wr] [,wi] [,m] [,s] [,sep] [,q] [,info])

call trsen(t, select [,w] [,m] [,s] [,sep] [,q] [,info])

Description

This routine reorders the Schur factorization of a general matrix A = Q*T*QH so that a selectedcluster of eigenvalues appears in the leading diagonal elements (or, for real flavors, diagonalblocks) of the Schur form. The reordered Schur form R is computed by an unitary (orthogonal)similarity transformation: R = ZH*T*Z. Optionally the updated matrix P of Schur vectors iscomputed as P = Q*Z, giving A = P*R*PH.

Let

where the selected eigenvalues are precisely the eigenvalues of the leading m-by-m submatrixT11. Let P be correspondingly partitioned as (Q 1 Q2) where Q1 consists of the first m columnsof Q. Then A*Q 1 = Q1T11, and so the m columns of Q1 form an orthonormal basis for theinvariant subspace corresponding to the selected cluster of eigenvalues.

Optionally the routine also computes estimates of the reciprocal condition numbers of theaverage of the cluster of eigenvalues and of the invariant subspace.

Input Parameters

CHARACTER*1. Must be 'N' or 'E' or 'V' or 'B'.jobIf job = 'N', then no condition numbers are required.If job = 'E', then only the condition number for the clusterof eigenvalues is computed.If job = 'V', then only the condition number for theinvariant subspace is computed.If job = 'B', then condition numbers for both the clusterand the invariant subspace are computed.

CHARACTER*1. Must be 'V' or 'N'.compqIf compq = 'V', then Q of the Schur vectors is updated.

826


If compq = 'N', then no Schur vectors are updated.

LOGICAL.selectArray, DIMENSION at least max (1, n).Specifies the eigenvalues in the selected cluster. To selectan eigenvalue lambda(j), select(j) must be .TRUE.For real flavors: to select a complex conjugate pair ofeigenvalues lambda(j) and lambda(j+1) (corresponding 2by 2 diagonal block), select(j) and/or select(j+1) mustbe .TRUE.; the complex conjugate lambda(j)andlambda(j+1) must be either both included in the cluster orboth excluded.


REAL for strsent, q, workDOUBLE PRECISION for dtrsenCOMPLEX for ctrsenDOUBLE COMPLEX for ztrsen.Arrays:t (ldt,*) The n-by-n T.The second dimension of t must be at least max(1, n).

q (ldq,*)If compq = 'V', then q must contain Q of Schur vectors.If compq = 'N', then q is not referenced.The second dimension of q must be at least max(1, n) ifcompq = 'V' and at least 1 if compq = 'N'.work is a workspace array, its dimension max(1, lwork).



If compq = 'N', then ldq ≥ 1.

If compq = 'V', then ldq ≥ max(1, n).


If job = 'V' or 'B', lwork ≥ max(1,2*m*(n-m)).

If job = 'E', then lwork ≥ max(1, m*(n-m))

827


If job = 'N', then lwork ≥ 1 for complex flavors and

lwork ≥ max(1,n) for real flavors.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

INTEGER.iwork(liwork) is a workspace array. The arrayiwork is not referenced if job = 'N' or 'E'.

iwork

The actual amount of workspace required cannot exceedn2/2 if job = 'V' or 'B'.


If job = 'V' or 'B', liwork ≥ max(1,2m(n-m)).

If job = 'E' or 'E', liwork ≥ 1.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the iwork array,returns this value as the first entry of the iwork array, andno error message related to liwork is issued by xerbla.See Application Notes for details.

Output Parameters

Overwritten by the updated matrix R.t

If compq = 'V', q contains the updated matrix of Schurvectors; the first m columns of the Q form an orthogonalbasis for the specified invariant subspace.

q

COMPLEX for ctrsenwDOUBLE COMPLEX for ztrsen.Array, DIMENSION at least max(1, n). The recordedeigenvalues of R. The eigenvalues are stored in the sameorder as on the diagonal of R.

REAL for strsenwr, wiDOUBLE PRECISION for dtrsen

828


Arrays, DIMENSION at least max(1, n). Contain the real andimaginary parts, respectively, of the reordered eigenvaluesof R. The eigenvalues are stored in the same order as onthe diagonal of R. Note that if a complex eigenvalue issufficiently ill-conditioned, then its value may differsignificantly from its value before reordering.

INTEGER.mFor complex flavors: the number of the specified invariantsubspaces, which is the same as the number of selectedeigenvalues (see select).For real flavors: the dimension of the specified invariantsubspace. The value of m is obtained by counting 1 for eachselected real eigenvalue and 2 for each selected complexconjugate pair of eigenvalues (see select).

Constraint: 0 ≤ m ≤ n.

REAL for single-precision flavorssDOUBLE PRECISION for double-precision flavors.If job = 'E' or 'B', s is a lower bound on the reciprocalcondition number of the average of the selected cluster ofeigenvalues.If m = 0 or n, then s = 1.For real flavors: if info = 1, then s is set to zero.s is notreferenced if job = 'N' or 'V'.

REAL for single-precision flavors DOUBLE PRECISION fordouble-precision flavors.

sep

If job = 'V' or 'B', sep is the estimated reciprocalcondition number of the specified invariant subspace.If m = 0 or n, then sep = |T|.For real flavors: if info = 1, then sep is set to zero.sep is not referenced if job = 'N' or 'E'.

On exit, if info = 0, then work(1) returns the optimal sizeof lwork.

work(1)

On exit, if info = 0, then iwork(1) returns the optimalsize of liwork.

iwork(1)


829




Specific details for the routine trsen interface are the following:







Restored based on the presence of the argument q as follows: compq= 'V', if q is present, compq = 'N', if q is omitted.

compq

Restored based on the presence of arguments s and sep as follows:jobjob = 'B', if both s and sep are present,job = 'E', if s is present and sep omitted,job = 'V', if s is omitted and sep present,job = 'N', if both s and sep are omitted.

Application Notes

The computed matrix R is exactly similar to a matrix T + E, where ||E||2 = O(ε)||T||2, and

ε is the machine precision. The computed s cannot underestimate the true reciprocal conditionnumber by more than a factor of (min(m, n-m))1/2; sep may differ from the true value by

(m*n-m2)1/2. The angle between the computed invariant subspace and the true subspace is O(ε)||A||2/sep. Note that if a 2 by 2 diagonal block is involved in the re-ordering, its off-diagonalelements are in general changed; the diagonal elements and the eigenvalues of the block areunchanged unless the block is sufficiently ill-conditioned, in which case they may be noticeablyaltered. It is possible for a 2 by 2 block to break into two 1 by 1 blocks, that is, for a pair ofcomplex eigenvalues to become purely real. The values of eigenvalues however are neverchanged by the re-ordering.


830



If you set lwork = -1, the routine returns immediately and provides the recommendedworkspace in the first element of the corresponding array (work, iwork). This operation is calleda workspace query.


?trsylSolves Sylvester equation for real quasi-triangularor complex triangular matrices.

Syntax

Fortran 77:

call strsyl(trana, tranb, isgn, m, n, a, lda, b, ldb, c, ldc, scale, info)

call dtrsyl(trana, tranb, isgn, m, n, a, lda, b, ldb, c, ldc, scale, info)

call ctrsyl(trana, tranb, isgn, m, n, a, lda, b, ldb, c, ldc, scale, info)

call ztrsyl(trana, tranb, isgn, m, n, a, lda, b, ldb, c, ldc, scale, info)

Fortran 95:

call trsyl(a, b, c, scale [, trana] [,tranb] [,isgn] [,info])

Description

This routine solves the Sylvester matrix equation op(A)X ± X op(B) = αC, where op(A) =A or AH, and the matrices A and B are upper triangular (or, for real flavors, upper quasi-triangular

in canonical Schur form); α ≤ 1 is a scale factor determined by the routine to avoid overflowin X; A is m-by-m, B is n-by-n, and C and X are both m-by-n. The matrix X is obtained by astraightforward process of back substitution.

831


The equation has a unique solution if and only if αi ± βi ≠ 0, where {αi} and {βi} are theeigenvalues of A and B, respectively, and the sign (+ or -) is the same as that used in theequation to be solved.

Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.tranaIf trana = 'N' , then op(A) = A.If trana = 'T', then op(A) = AT (real flavors only).If trana = 'C' then op(A) = AH.

CHARACTER*1. Must be 'N' or 'T' or 'C'.tranbIf tranb = 'N' , then op(B) = B.If tranb = 'T', then op(B) = BT (real flavors only).If tranb = 'C', then op(B) = BH.

INTEGER. Indicates the form of the Sylvester equation.isgnIf isgn = +1, op(A)*X + X*op(B) = alpha*C.If isgn = -1, op(A)*X - X*op(B) = alpha*C.

INTEGER. The order of A, and the number of rows in X and

C (m ≥ 0).

m

INTEGER. The order of B, and the number of columns in X

and C (n ≥ 0).

n

REAL for strsyla, b, cDOUBLE PRECISION for dtrsylCOMPLEX for ctrsylDOUBLE COMPLEX for ztrsyl.Arrays:a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, m).b(ldb,*) contains the matrix B.The second dimension of b must be at least max(1, n).c(ldc,*) contains the matrix C.The second dimension of c must be at least max(1, n).



INTEGER. The first dimension of c; at least max(1, n).ldc

832


Output Parameters

Overwritten by the solution matrix X.c

REAL for single-precision flavorsscaleDOUBLE PRECISION for double-precision flavors.

The value of the scale factor α.


If info = -i, the i-th parameter had an illegal value.

If info = 1, A and B have common or close eigenvaluesperturbed values were used to solve the equation.



Specific details for the routine trsyl interface are the following:

Holds the matrix A of size (m,m).a



Must be 'N', 'C', or 'T'. The default value is 'N'.trana

Must be 'N', 'C', or 'T'. The default value is 'N'.tranb

Must be +1 or -1. The default value is +1.isgn

Application Notes

Let X be the exact, Y the corresponding computed solution, and R the residual matrix: R = C- (AY ± YB). Then the residual is always small:

||R||F = O(ε) (||A||F + ||B||F) ||Y||F.

However, Y is not necessarily the exact solution of a slightly perturbed equation; in other words,the solution is not backwards stable.

For the forward error, the following bound holds:

||Y - X||F ≤||R||F/sep(A, B)

833


but this may be a considerable overestimate. See [Golub96] for a definition of sep(A, B).

The approximate number of floating-point operations for real flavors is m*n*(m + n). For complexflavors it is 4 times greater.

Generalized Nonsymmetric Eigenvalue Problems

This section describes LAPACK routines for solving generalized nonsymmetric eigenvalueproblems, reordering the generalized Schur factorization of a pair of matrices, as well asperforming a number of related computational tasks.

A generalized nonsymmetric eigenvalue problem is as follows: given a pair ofnonsymmetric (or non-Hermitian) nby-n matrices A and B, find the generalized eigenvalues

λ and the corresponding generalized eigenvectors x and y that satisfy the equations

Ax = λBx (right generalized eigenvectors x)

and

yHA = λyHB (left generalized eigenvectors y).

Table 4-6 lists LAPACK routines (Fortran-77 interface) used to solve the generalizednonsymmetric eigenvalue problems and the generalized Sylvester equation. Respective routinenames in Fortran-95 interface are without the first symbol (see Routine Naming Conventions).

Table 4-6 Computational Routines for Solving Generalized Nonsymmetric EigenvalueProblems

Operation performedRoutinename

Reduces a pair of matrices to generalized upper Hessenberg form usingorthogonal/unitary transformations.

?gghrd

Balances a pair of general real or complex matrices.?ggbal

Forms the right or left eigenvectors of a generalized eigenvalue problem.?ggbak

Implements the QZ method for finding the generalized eigenvalues of the matrixpair (H,T).

?hgeqz

Computes some or all of the right and/or left generalized eigenvectors of a pairof upper triangular matrices

?tgevc

834


Operation performedRoutinename

Reorders the generalized Schur decomposition of a pair of matrices (A,B) sothat one diagonal block of (A,B) moves to another row index.

?tgexc

Reorders the generalized Schur decomposition of a pair of matrices (A,B) sothat a selected cluster of eigenvalues appears in the leading diagonal blocks of(A,B).

?tgsen

Solves the generalized Sylvester equation.?tgsyl

Estimates reciprocal condition numbers for specified eigenvalues and/oreigenvectors of a pair of matrices in generalized real Schur canonical form.

?tgsna

?gghrdReduces a pair of matrices to generalized upperHessenberg form using orthogonal/unitarytransformations.

Syntax

Fortran 77:

call sgghrd(compq, compz, n, ilo, ihi, a, lda, b, ldb, q, ldq, z, ldz, info)

call dgghrd(compq, compz, n, ilo, ihi, a, lda, b, ldb, q, ldq, z, ldz, info)

call cgghrd(compq, compz, n, ilo, ihi, a, lda, b, ldb, q, ldq, z, ldz, info)

call zgghrd(compq, compz, n, ilo, ihi, a, lda, b, ldb, q, ldq, z, ldz, info)

Fortran 95:

call gghrd(a, b [,ilo] [,ihi] [,q] [,z] [,compq] [,compz] [,info])

Description

This routine reduces a pair of real/complex matrices (A,B) to generalized upper Hessenbergform using orthogonal/unitary transformations, where A is a general matrix and B is upper

triangular. The form of the generalized eigenvalue problem is A*x = λ*B*x, and B is typicallymade upper triangular by computing its QR factorization and moving the orthogonal matrix Qto the left side of the equation.

835


This routine simultaneously reduces A to a Hessenberg matrix H:

QH*A*Z = H

and transforms B to another upper triangular matrix T:

QH*B*Z = T

in order to reduce the problem to its standard form H*y = λ*T*y, where y = ZH*x.

The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. Theymay either be formed explicitly, or they may be postmultiplied into input matrices Q1 and 1, sothat

Q1*A*Z*Z1H = (Q1*Q)*H*(Z1*Z)

H

Q1*B*Z1H = (Q1*Q)*T*(Z1*Z)

H

If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation Ax

= λBx, then the routine ?gghrd reduces the original problem to generalized Hessenberg form.

Input Parameters

CHARACTER*1. Must be 'N', 'I', or 'V'.compqIf compq = 'N', matrix Q is not computed.If compq = 'I', Q is initialized to the unit matrix, and theorthogonal/unitary matrix Q is returned;If compq = 'V', Q must contain an orthogonal/unitarymatrix Q1 on entry, and the product Q1*Q is returned.

CHARACTER*1. Must be 'N', 'I', or 'V'.compzIf compz = 'N', matrix Z is not computed.If compz = 'I', Z is initialized to the unit matrix, and theorthogonal/unitary matrix Z is returned;If compz = 'V', Z must contain an orthogonal/unitarymatrix Z1 on entry, and the product Z1*Z is returned.


INTEGER. ilo and ihi mark the rows and columns of Awhich are to be reduced. It is assumed that A is alreadyupper triangular in rows and columns 1:ilo-1 and ihi+1:n.

ilo, ihi

Values of ilo and ihi are normally set by a previous callto ?ggbal; otherwise they should be set to 1 and nrespectively.

836


Constraint:

If n > 0, then 1 ≤ ilo ≤ ihi ≤ n;if n = 0, then ilo = 1 and ihi = 0.

REAL for sgghrda, b, q, zDOUBLE PRECISION for dgghrdCOMPLEX for cgghrdDOUBLE COMPLEX for zgghrd.Arrays:a(lda,*) contains the n-by-n general matrix A. The seconddimension of a must be at least max(1, n).b(ldb,*) contains the n-by-n upper triangular matrix B.The second dimension of b must be at least max(1, n).q (ldq,*)If compq = 'N', then q is not referenced.If compq = 'V', then q must contain the orthogonal/unitarymatrix Q1, typically from the QR factorization of B.The second dimension of q must be at least max(1, n).z (ldz,*)If compq = 'N', then z is not referenced.If compq = 'V', then z must contain the orthogonal/unitarymatrix Z1.The second dimension of z must be at least max(1, n).





If compq = 'I'or 'V', then ldq ≥ max(1, n).


If compq = 'N', then ldz ≥ 1.

If compq = 'I'or 'V', then ldz ≥ max(1, n).

Output Parameters

On exit, the upper triangle and the first subdiagonal of Aare overwritten with the upper Hessenberg matrix H, andthe rest is set to zero.

a

837


On exit, overwritten by the upper triangular matrix T =QH*B*Z. The elements below the diagonal are set to zero.

b

If compq = 'I', then q contains the orthogonal/unitarymatrix Q, ;

q

If compq = 'V', then q is overwritten by the product Q1*Q.

If compq = 'I', then z contains the orthogonal/unitarymatrix Z;

z

If compq = 'V', then z is overwritten by the product Z1*Z.




Specific details for the routine gghrd interface are the following:







If omitted, this argument is restored based on the presence of argumentq as follows: compq = 'I', if q is present, compq = 'N', if q is omitted.

compq

If present, compq must be equal to 'I' or 'V' and the argument qmust also be present. Note that there will be an error condition if compqis present and q omitted.


compz


838


?ggbalBalances a pair of general real or complex matrices.

Syntax

Fortran 77:

call sggbal(job, n, a, lda, b, ldb, ilo, ihi, lscale, rscale, work, info)

call dggbal(job, n, a, lda, b, ldb, ilo, ihi, lscale, rscale, work, info)

call cggbal(job, n, a, lda, b, ldb, ilo, ihi, lscale, rscale, work, info)

call zggbal(job, n, a, lda, b, ldb, ilo, ihi, lscale, rscale, work, info)

Fortran 95:

call ggbal(a, b [,ilo] [,ihi] [,lscale] [,rscale] [,job] [,info])

Description

This routine balances a pair of general real/complex matrices (A,B). This involves, first, permutingA and B by similarity transformations to isolate eigenvalues in the first 1 to ilo-1 and lastihi+1 to n elements on the diagonal;and second, applying a diagonal similarity transformationto rows and columns ilo to ihi to make the rows and columns as close in norm as possible.Both steps are optional. Balancing may reduce the 1-norm of the matrices, and improve theaccuracy of the computed eigenvalues and/or eigenvectors in the generalized eigenvalue problem

Ax = λBx.

Input Parameters

CHARACTER*1. Specifies the operations to be performed onA and B. Must be 'N' or 'P' or 'S' or 'B'.

job

If job = 'N', then no operations are done; simply set ilo=1, ihi=n, lscale(i) =1.0 and rscale(i)=1.0 fori = 1,..., n.If job = 'P', then permute only.If job = 'S', then scale only.If job = 'B', then both permute and scale.


REAL for sggbala, b

839


DOUBLE PRECISION for dggbalCOMPLEX for cggbalDOUBLE COMPLEX for zggbal.Arrays:a(lda,*) contains the matrix A. The second dimension of amust be at least max(1, n).b(ldb,*) contains the matrix B. The second dimension of bmust be at least max(1, n).



REAL for single precision flavorsworkDOUBLE PRECISION for double precision flavors.Workspace array, DIMENSION at least max(1, 6n) whenjob = 'S'or 'B', or at least 1 when job = 'N'or 'P'.

Output Parameters

Overwritten by the balanced matrices A and B, respectively.a, bIf job = 'N', a and b are not referenced.

INTEGER. ilo and ihi are set to integers such that on exita(i,j)=0 and b(i,j)=0 if i>j and j=1,...,ilo-1 ori=ihi+1,..., n.

ilo, ihi

If job = 'N'or 'S', then ilo = 1 and ihi = n.

REAL for single precision flavorslscale, rscaleDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, n).lscale contains details of the permutations and scalingfactors applied to the left side of A and B.If Pj is the index of the row interchanged with row j, andDj is the scaling factor applied to row j, thenlscale(J) = Pj, for j = 1,..., ilo-1= Dj, for j = ilo,...,ihi= Pj, for j = ihi+1,..., n.rscale contains details of the permutations and scalingfactors applied to the right side of A and B.If Pj is the index of the column interchanged with columnj, and Dj is the scaling factor applied to column j, then

840


rscale(j) = Pj, for j = 1,..., ilo-1= Dj, for j = ilo,...,ihi= Pj, for j = ihi+1,..., nThe order in which the interchanges are made is n to ihi+1,then 1 to ilo-1.




Specific details for the routine ggbal interface are the following:



Holds the vector of length (n).lscale

Holds the vector of length (n).rscale




?ggbakForms the right or left eigenvectors of a generalizedeigenvalue problem.

Syntax

Fortran 77:

call sggbak(job, side, n, ilo, ihi, lscale, rscale, m, v, ldv, info)

call dggbak(job, side, n, ilo, ihi, lscale, rscale, m, v, ldv, info)

call cggbak(job, side, n, ilo, ihi, lscale, rscale, m, v, ldv, info)

call zggbak(job, side, n, ilo, ihi, lscale, rscale, m, v, ldv, info)

841


Fortran 95:

call ggbak(v [, ilo] [,ihi] [,lscale] [,rscale] [,job] [,info])

Description

This routine forms the right or left eigenvectors of a real/complex generalized eigenvalueproblem

Ax = λBx

by backward transformation on the computed eigenvectors of the balanced pair of matricesoutput by ?ggbal.

Input Parameters

CHARACTER*1. Specifies the type of backward transformationrequired. Must be 'N', 'P', 'S', or 'B'.

job

If job = 'N', then no operations are done; return.If job = 'P', then do backward transformation forpermutation only.If job = 'S', then do backward transformation for scalingonly.If job = 'B', then do backward transformation for bothpermutation and scaling. This argument must be the sameas the argument job supplied to ?ggbal.

CHARACTER*1. Must be 'L' or 'R'.sideIf side = 'L' , then v contains left eigenvectors.If side = 'R' , then v contains right eigenvectors.

INTEGER. The number of rows of the matrix V (n ≥ 0).n

INTEGER. The integers ilo and ihi determined by ?gebal.Constraint:

ilo, ihi


REAL for single precision flavorslscale, rscaleDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(1, n).The array lscale contains details of the permutations and/orscaling factors applied to the left side of A and B, as returnedby ?ggbal.

842


The array rscale contains details of the permutations and/orscaling factors applied to the right side of A and B, asreturned by ?ggbal.

INTEGER. The number of columns of the matrix Vm

(m ≥ 0).

REAL for sggbakvDOUBLE PRECISION for dggbakCOMPLEX for cggbakDOUBLE COMPLEX for zggbak.Array v(ldv,*). Contains the matrix of right or lefteigenvectors to be transformed, as returned by ?tgevc.The second dimension of v must be at least max(1, m).

INTEGER. The first dimension of v; at least max(1, n).ldv

Output Parameters

Overwritten by the transformed eigenvectorsv




Specific details for the routine ggbak interface are the following:

Holds the matrix V of size (n,m).v






If omitted, this argument is restored based on the presence ofarguments lscale and rscale as follows:

side

side = 'L', if lscale is present and rscale omitted,

843


side = 'R', if lscale is omitted and rscale present.Note that there will be an error condition if both lscale and rscaleare present or if they both are omitted.

?hgeqzImplements the QZ method for finding thegeneralized eigenvalues of the matrix pair (H,T).

Syntax

Fortran 77:

call shgeqz(job, compq, compz, n, ilo, ihi, h, ldh, t, ldt, alphar, alphai,beta, q, ldq, z, ldz, work, lwork, info)

call dhgeqz(job, compq, compz, n, ilo, ihi, h, ldh, t, ldt, alphar, alphai,beta, q, ldq, z, ldz, work, lwork, info)

call chgeqz(job, compq, compz, n, ilo, ihi, h, ldh, t, ldt, alpha, beta, q,ldq, z, ldz, work, lwork, rwork, info)

call zhgeqz(job, compq, compz, n, ilo, ihi, h, ldh, t, ldt, alpha, beta, q,ldq, z, ldz, work, lwork, rwork, info)

Fortran 95:

call hgeqz(h, t [,ilo] [,ihi] [,alphar] [,alphai] [,beta] [,q] [,z] [,job][,compq] [,compz] [,info])

call hgeqz(h, t [,ilo] [,ihi] [,alpha] [,beta] [,q] [,z] [,job] [,compq] [,compz] [,info])

Description

This routine computes the eigenvalues of a real/complex matrix pair (H,T), where H is an upperHessenberg matrix and T is upper triangular, using the double-shift version (for real flavors)or single-shift version (for complex flavors) of the QZ method. Matrix pairs of this type areproduced by the reduction to generalized upper Hessenberg form of a real/complex matrix pair(A,B):

A = Q1 *H* Z1H , B = Q1* T* Z1

H,

as computed by ?gghrd.

844


For real flavors:

If job = 'S', then the Hessenberg-triangular pair (H,T) is also reduced to generalized Schurform,

H = Q*S*ZT, T = Q *P*ZT,

where Q and Z are orthogonal matrices, P is an upper triangular matrix, and S is aquasi-triangular matrix with 1-by-1 and 2-by-2 diagonal blocks. The 1-by-1 blocks correspondto real eigenvalues of the matrix pair (H,T) and the 2-by-2 blocks correspond to complexconjugate pairs of eigenvalues.

Additionally, the 2-by-2 upper triangular diagonal blocks of P corresponding to 2-by-2 blocksof S are reduced to positive diagonal form, that is, if S(j+1,j) is non-zero, then P(j+1,j) =P(j,j+1) = 0, P(j,j) > 0, and P(j+1,j+1) > 0.

For complex flavors:

If job = 'S', then the Hessenberg-triangular pair (H,T) is also reduced to generalized Schurform,

H = Q* S*ZH, T = Q*P*ZH,

where Q and Z are unitary matrices, and S and P are upper triangular.

For all function flavors:

Optionally, the orthogonal/unitary matrix Q from the generalized Schur factorization may bepostmultiplied into an input matrix Q1, and the orthogonal/unitary matrix Z may be postmultipliedinto an input matrix Z1.

If Q1 and Z1 are the orthogonal/unitary matrices from ?gghrd that reduced the matrix pair (A,B)to generalized upper Hessenberg form, then the output matrices Q1Q and Z 1Z are theorthogonal/unitary factors from the generalized Schur factorization of (A,B):

A = (Q1Q)*S *(Z1Z)H, B = (Q1Q) *P *(Z1Z)

H.

To avoid overflow, eigenvalues of the matrix pair (H,T) (equivalently, of (A,B)) are computedas a pair of values (alpha,beta>). For chgeqz/zhgeqz, alpha and beta are complex, and for

shgeqz/dhgeqz, alpha is complex and beta real. If beta is nonzero, λ = alpha / beta isan eigenvalue of the generalized nonsymmetric eigenvalue problem (GNEP)

Ax = λBx

and if alpha is nonzero, μ = beta / alpha is an eigenvalue of the alternate form of the GNEP

845


μAy = By .

Real eigenvalues (for real flavors) or the values of alpha and beta for the i-th eigenvalue (forcomplex flavors) can be read directly from the generalized Schur form:

alpha = S(i,i), beta = P(i,i).

Input Parameters

CHARACTER*1. Specifies the operations to be performed.Must be 'E' or 'S'.

job

If job = 'E', then compute eigenvalues only;If job = 'S', then compute eigenvalues and the Schurform.

CHARACTER*1. Must be 'N', 'I', or 'V'.compqIf compq = 'N', left Schur vectors (q) are not computed;If compq = 'I', q is initialized to the unit matrix and thematrix of left Schur vectors of (H,T) is returned;If compq = 'V', q must contain an orthogonal/unitarymatrix Q1 on entry and the product Q1*Q is returned.

CHARACTER*1. Must be 'N', 'I', or 'V'.compzIf compz = 'N', right Schur vectors (z) are not computed;If compz = 'I', z is initialized to the unit matrix and thematrix of right Schur vectors of (H,T) is returned;If compz = 'V', z must contain an orthogonal/unitarymatrix Z1 on entry and the product Z1*Z is returned.

INTEGER. The order of the matrices H, T, Q, and Zn

(n ≥ 0).

INTEGER. ilo and ihi mark the rows and columns of Hwhich are in Hessenberg form. It is assumed that H isalready upper triangular in rows and columns 1:ilo-1 andihi+1:n.

ilo, ihi

Constraint:


REAL for shgeqzh, t, q, z, workDOUBLE PRECISION for dhgeqzCOMPLEX for chgeqz

846


DOUBLE COMPLEX for zhgeqz.Arrays:On entry, h(ldh,*) contains the n-by-n upper Hessenbergmatrix H.The second dimension of h must be at least max(1, n).On entry, t(ldt,*) contains the n-by-n upper triangularmatrix T.The second dimension of t must be at least max(1, n).q (ldq,*):On entry, if compq = 'V', this array contains theorthogonal/unitary matrix Q1 used in the reduction of (A,B)to generalized Hessenberg form.If compq = 'N', then q is not referenced.The second dimension of q must be at least max(1, n).z (ldz,*):On entry, if compz = 'V', this array contains theorthogonal/unitary matrix Z1 used in the reduction of (A,B)to generalized Hessenberg form.If compz = 'N', then z is not referenced.The second dimension of z must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).





If compq = 'I'or 'V', then ldq ≥ max(1, n).


If compq = 'N', then ldz ≥ 1.

If compq = 'I'or 'V', then ldz ≥ max(1, n).


lwork ≥ max(1, n).

847


If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

REAL for chgeqzrworkDOUBLE PRECISION for zhgeqz.Workspace array, DIMENSION at least max(1, n). Used incomplex flavors only.

Output Parameters

For real flavors:hIf job = 'S', then, on exit, h contains the upperquasi-triangular matrix S from the generalized Schurfactorization; 2-by-2 diagonal blocks (corresponding tocomplex conjugate pairs of eigenvalues) are returned instandard form, with h(i,i) = h(i+1, i+1) and h(i+1,i) * h(i, i+1) < 0.If job = 'E', then on exit the diagonal blocks of h matchthose of S, but the rest of h is unspecified.For complex flavors:If job = 'S', then, on exit, h contains the upper triangularmatrix S from the generalized Schur factorization.If job = 'E', then on exit the diagonal of h matches thatof S, but the rest of h is unspecified.

If job = 'S', then, on exit, t contains the upper triangularmatrix P from the generalized Schur factorization.

t

For real flavors:2-by-2 diagonal blocks of P corresponding to 2-by-2 blocksof S are reduced to positive diagonal form, that is, if h(j+1,j)is non-zero, then t(j+1,j)=t(j,j+1)=0 and t(j,j) andt(j+1,j+1) will be positive.If job = 'E', then on exit the diagonal blocks of t matchthose of P, but the rest of t is unspecified.For complex flavors:if job = 'E', then on exit the diagonal of t matches thatof P, but the rest of t is unspecified.

848

REAL for shgeqz;alphar, alphaiDOUBLE PRECISION for dhgeqz.Arrays, DIMENSION at least max(1, n). The real andimaginary parts, respectively, of each scalar alpha definingan eigenvalue of GNEP.If alphai(j) is zero, then the j-th eigenvalue is real; ifpositive, then the j-th and (j+1)-th eigenvalues are acomplex conjugate pair, withalphai(j+1) = -alphai(j).

COMPLEX for chgeqz;alphaDOUBLE COMPLEX for zhgeqz.Array, DIMENSION at least max(1, n).The complex scalars alpha that define the eigenvalues ofGNEP. alphai(i) = S(i,i) in the generalized Schurfactorization.

REAL for shgeqzbetaDOUBLE PRECISION for dhgeqzCOMPLEX for chgeqzDOUBLE COMPLEX for zhgeqz.Array, DIMENSION at least max(1, n).For real flavors:The scalars beta that define the eigenvalues of GNEP.Together, the quantities alpha = (alphar(j),alphai(j)) and beta = beta(j) represent the j-theigenvalue of the matrix pair (A,B), in one of the formslambda = alpha/beta or mu = beta/alpha. Since eitherlambda or mu may overflow, they should not, in general, becomputed.For complex flavors:The real non-negative scalars beta that define theeigenvalues of GNEP.beta(i) = P(i,i) in the generalized Schur factorization.Together, the quantities alpha = alpha(j) and beta =beta(j) represent the j-th eigenvalue of the matrix pair(A,B), in one of the forms lambda = alpha/beta or mu =beta/alpha. Since either lambda or mu may overflow, theyshould not, in general, be computed.

849


On exit, if compq = 'I', q is overwritten by theorthogonal/unitary matrix of left Schur vectors of the pair(H,T), and if compq = 'V', q is overwritten by theorthogonal/unitary matrix of left Schur vectors of (A,B).

q

On exit, if compz = 'I', z is overwritten by theorthogonal/unitary matrix of right Schur vectors of the pair(H,T), and if compz = 'V', z is overwritten by theorthogonal/unitary matrix of right Schur vectors of (A,B).

z

If info ≥ 0, on exit, work(1) contains the minimum valueof lwork required for optimum performance. Use this lworkfor subsequent runs.

work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = 1,..., n, the QZ iteration did not converge.(H,T) is not in Schur form, but alphar(i), alphai(i) (for realflavors), alpha(i) (for complex flavors), and beta(i),i=info+1,..., n should be correct.If info = n+1,...,2n, the shift calculation failed.(H,T) is not in Schur form, but alphar(i), alphai(i) (for realflavors), alpha(i) (for complex flavors), and beta(i), i=info-n+1,..., n should be correct.



Specific details for the routine hgeqz interface are the following:



Holds the vector of length (n). Used in real flavors only.alphar

Holds the vector of length (n). Used in real flavors only.alphai

Holds the vector of length (n). Used in complex flavors only.alpha

Holds the vector of length (n).beta

850






Must be 'E' or 'S'. The default value is 'E'.job

If omitted, this argument is restored based on the presence of argumentq as follows:

compq

compq = 'I', if q is present,compq = 'N', if q is omitted.If present, compq must be equal to 'I' or 'V' and the argument qmust also be present.Note that there will be an error condition if compq is present and qomitted.


compz

compz = 'I', if z is present,compz = 'N', if z is omitted.If present, compz must be equal to 'I' or 'V' and the argument zmust also be present.Note an error condition if compz is present and z is omitted.

Application Notes





851


?tgevcComputes some or all of the right and/or leftgeneralized eigenvectors of a pair of uppertriangular matrices.

Syntax

Fortran 77:

call stgevc(side, howmny, select, n, s, lds, p, ldp, vl, ldvl, vr, ldvr, mm,m, work, info)

call dtgevc(side, howmny, select, n, s, lds, p, ldp, vl, ldvl, vr, ldvr, mm,m, work, info)

call ctgevc(side, howmny, select, n, s, lds, p, ldp, vl, ldvl, vr, ldvr, mm,m, work, rwork, info)

call ztgevc(side, howmny, select, n, s, lds, p, ldp, vl, ldvl, vr, ldvr, mm,m, work, rwork, info)

Fortran 95:

call tgevc(s, p [,howmny] [,select] [,vl] [,vr] [,m] [,info])

Description

This routine computes some or all of the right and/or left eigenvectors of a pair of real/complexmatrices (S,P), where S is quasi-triangular (for real flavors) or upper triangular (for complexflavors) and P is upper triangular.

Matrix pairs of this type are produced by the generalized Schur factorization of a real/complexmatrix pair (A,B):

A = Q*S*ZH, B = Q*P*ZH

as computed by ?gghrd plus ?hgeqz.

The right eigenvector x and the left eigenvector y of (S,P) corresponding to an eigenvalue ware defined by:

S*x = w*P*x, yH*S = w*yH*P

The eigenvalues are not input to this routine, but are computed directly from the diagonalblocks or diagonal elements of S and P.

852


This routine returns the matrices X and/or Y of right and left eigenvectors of (S,P), or theproducts Z*X and/or Q*Y, where Z and Q are input matrices.

If Q and Z are the orthogonal/unitary factors from the generalized Schur factorization of a matrixpair (A,B), then Z*X and Q*Y are the matrices of right and left eigenvectors of (A,B).

Input Parameters

CHARACTER*1. Must be 'R', 'L', or 'B'.sideIf side = 'R' , compute right eigenvectors only.If side = 'L' , compute left eigenvectors only.If side = 'B', compute both right and left eigenvectors.

CHARACTER*1. Must be 'A' , 'B', or 'S'.howmnyIf howmny = 'A' , compute all right and/or lefteigenvectors.If howmny = 'B' , compute all right and/or lefteigenvectors, backtransformed by the matrices in vr and/orvl.If howmny = 'S' , compute selected right and/or lefteigenvectors, specified by the logical array select.

LOGICAL.selectArray, DIMENSION at least max (1, n).If howmny = 'S', select specifies the eigenvectors to becomputed.If howmny = 'A'or 'B', select is not referenced.For real flavors:If omega(j) is a real eigenvalue, the corresponding realeigenvector is computed if select(j) is .TRUE..If omega(j) and omega(j+1) are the real and imaginary partsof a complex eigenvalue, the corresponding complexeigenvector is computed if either select(j) or select(j+1)is .TRUE., and on exit select(j) is set to .TRUE.andselect(j+1) is set to .FALSE..For complex flavors:The eigenvector corresponding to the j-th eigenvalue iscomputed if select(j) is .TRUE..

INTEGER. The order of the matrices S and P (n ≥ 0).n

REAL for stgevcs, p, vl, vr, work

853


DOUBLE PRECISION for dtgevcCOMPLEX for ctgevcDOUBLE COMPLEX for ztgevc.Arrays:s(lds,*) contains the matrix S from a generalized Schurfactorization as computed by ?hgeqz. This matrix is upperquasi-triangular for real flavors, and upper triangular forcomplex flavors.The second dimension of s must be at least max(1, n).p(ldp,*) contains the upper triangular matrix P from ageneralized Schur factorization as computed by ?hgeqz.For real flavors, 2-by-2 diagonal blocks of P correspondingto 2-by-2 blocks of S must be in positive diagonal form.For complex flavors, P must have real diagonal elements.The second dimension of p must be at least max(1, n).If side = 'L' or 'B' and howmny = 'B', vl(ldvl,*) mustcontain an n-by-n matrix Q (usually the orthogonal/unitarymatrix Q of left Schur vectors returned by ?hgeqz). Thesecond dimension of vl must be at least max(1, mm).If side = 'R' , vl is not referenced.If side = 'R' or 'B' and howmny = 'B', vr(ldvr,*) mustcontain an n-by-n matrix Z (usually the orthogonal/unitarymatrix Z of right Schur vectors returned by ?hgeqz). Thesecond dimension of vr must be at least max(1, mm).If side = 'L', vr is not referenced.work(*) is a workspace array.DIMENSION at least max (1, 6*n) for real flavors and atleast max (1, 2*n) for complex flavors.

INTEGER. The first dimension of s; at least max(1, n).lds

INTEGER. The first dimension of p; at least max(1, n).ldp

INTEGER. The first dimension of vl;ldvl

If side = 'L' or 'B', then ldvl ≥n.

If side = 'R', then ldvl ≥ 1.

INTEGER. The first dimension of vr;ldvr

If side = 'R' or 'B', then ldvr ≥n.

If side = 'L', then ldvr ≥ 1.

854


INTEGER. The number of columns in the arrays vl and/or

vr (mm ≥ m).

mm

REAL for ctgevc DOUBLE PRECISION for ztgevc.Workspace array, DIMENSION at least max (1, 2*n). Usedin complex flavors only.

rwork

Output Parameters

On exit, if side = 'L' or 'B', vl contains:vlif howmny = 'A', the matrix Y of left eigenvectors of (S,P);if howmny = 'B', the matrix Q*Y;if howmny = 'S', the left eigenvectors of (S,P) specified byselect, stored consecutively in the columns of vl, in thesame order as their eigenvalues.For real flavors:A complex eigenvector corresponding to a complexeigenvalue is stored in two consecutive columns, the firstholding the real part, and the second the imaginary part.

On exit, if side = 'R' or 'B', vr contains:vrif howmny = 'A', the matrix X of right eigenvectors of (S,P);if howmny = 'B', the matrix Z*X;if howmny = 'S', the right eigenvectors of (S,P) specifiedby select, stored consecutively in the columns of vr, in thesame order as their eigenvalues.For real flavors:A complex eigenvector corresponding to a complexeigenvalue is stored in two consecutive columns, the firstholding the real part, and the second the imaginary part.

INTEGER. The number of columns in the arrays vl and/orvr actually used to store the eigenvectors.

m

If howmny = 'A' or 'B', m is set to n.For real flavors:Each selected real eigenvector occupies one column andeach selected complex eigenvector occupies two columns.For complex flavors:Each selected eigenvector occupies one column.


855


If info = -i, the i-th parameter had an illegal value.For real flavors:if info = i>0, the 2-by-2 block (i:i+1) does not have acomplex eigenvalue.



Specific details for the routine tgevc interface are the following:

Holds the matrix S of size (n,n).s

Holds the matrix P of size (n,n).p




Restored based on the presence of arguments vl and vr as follows:sideside = 'B', if both vl and vr are present,side = 'L', if vl is present and vr omitted,side = 'R', if vl is omitted and vr present,Note that there will be an error condition if both vl and vr are omitted.

If omitted, this argument is restored based on the presence of argumentselect as follows:

howmny

howmny = 'S', if select is present,howmny = 'A', if select is omitted.If present, howmny must be equal to 'A' or 'B' and the argumentselect must be omitted.Note that there will be an error condition if both howmny and selectare present.

856


?tgexcReorders the generalized Schur decomposition ofa pair of matrices (A,B) so that one diagonal blockof (A,B) moves to another row index.

Syntax

Fortran 77:

call stgexc(wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, ifst, ilst, work,lwork, info)

call dtgexc(wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, ifst, ilst, work,lwork, info)

call ctgexc(wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, ifst, ilst, info)

call ztgexc(wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, ifst, ilst, info)

Fortran 95:

call tgexc(a, b [,ifst] [,ilst] [,z] [,q] [,info])

Description

This routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrixpair (A,B) using an orthogonal/unitary equivalence transformation

(A, B) = Q (A, B) ZH,

so that the diagonal block of (A, B) with row index ifst is moved to row ilst. Matrix pair (A,B) must be in a generalized real-Schur/Schur canonical form (as returned by ?gges), that is,A is block upper triangular with 1-by-1 and 2-by-2 diagonal blocks and B is upper triangular.Optionally, the matrices Q and Z of generalized Schur vectors are updated.

Q(in) * A(in) * Z(in)' = Q(out) * A(out) * Z(out)'

Q(in) * B(in) * Z(in)' = Q(out) * B(out) * Z(out)'.

Input Parameters

LOGICAL.wantq, wantzIf wantq = .TRUE., update the left transformation matrixQ;If wantq = .FALSE., do not update Q;

857


If wantz = .TRUE., update the right transformation matrixZ;If wantz = .FALSE., do not update Z.


REAL for stgexca, b, q,DOUBLE PRECISION for dtgexcCOMPLEX for ctgexcDOUBLE COMPLEX for ztgexc.Arrays:a(lda,*) contains the matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the matrix B. The second dimension of bmust be at least max(1, n).q (ldq,*)If wantq = .FALSE., then q is not referenced.If wantq = .TRUE., then q must contain theorthogonal/unitary matrix Q.The second dimension of q must be at least max(1, n).z (ldz,*)If wantz = .FALSE., then z is not referenced.If wantz = .TRUE., then z must contain theorthogonal/unitary matrix Z.The second dimension of z must be at least max(1, n).




If wantq = .FALSE., then ldq ≥ 1.

If wantq = .TRUE., then ldq ≥ max(1, n).


If wantz = .FALSE., then ldz ≥ 1.

If wantz = .TRUE., then ldz ≥ max(1, n).

INTEGER. Specify the reordering of the diagonal blocks of(A, B). The block with row index ifst is moved to row ilst,by a sequence of swapping between adjacent blocks.

Constraint: 1 ≤ ifst, ilst ≤ n.

ifst, ilst

858


REAL for stgexc;workDOUBLE PRECISION for dtgexc.Workspace array, DIMENSION (lwork). Used in real flavorsonly.

INTEGER. The dimension of work; must be at least 4n +16.lworkIf lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

Output Parameters

Overwritten by the updated matrices A and B.a, b

Overwritten for real flavors only.ifst, ilstIf ifst pointed to the second row of a 2 by 2 block on entry,it is changed to point to the first row; ilst always pointsto the first row of the block in its final position (which maydiffer from its input value by ±1).

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, the transformed matrix pair (A, B) would betoo far from generalized Schur form; the problem isill-conditioned. (A, B) may have been partially reordered,and ilst points to the first row of the current position ofthe block being moved.



Specific details for the routine tgexc interface are the following:




859



Restored based on the presence of the argument q as follows:wantqwantq = .TRUE, if q is present,wantq = .FALSE, if q is omitted.

Restored based on the presence of the argument z as follows:wantzwantz = .TRUE, if z is present,wantz = .FALSE, if z is omitted.

Application Notes





860


?tgsenReorders the generalized Schur decomposition ofa pair of matrices (A,B) so that a selected clusterof eigenvalues appears in the leading diagonalblocks of (A,B).

Syntax

Fortran 77:

call stgsen(ijob, wantq, wantz, select, n, a, lda, b, ldb, alphar, alphai,beta, q, ldq, z, ldz, m, pl, pr, dif, work, lwork, iwork, liwork, info)

call dtgsen(ijob, wantq, wantz, select, n, a, lda, b, ldb, alphar, alphai,beta, q, ldq, z, ldz, m, pl, pr, dif, work, lwork, iwork, liwork, info)

call ctgsen(ijob, wantq, wantz, select, n, a, lda, b, ldb, alpha, beta, q,ldq, z, ldz, m, pl, pr, dif, work, lwork, iwork, liwork, info)

call ztgsen(ijob, wantq, wantz, select, n, a, lda, b, ldb, alpha, beta, q,ldq, z, ldz, m, pl, pr, dif, work, lwork, iwork, liwork, info)

Fortran 95:

call tgsen(a, b, select [,alphar] [,alphai] [,beta] [,ijob] [,q] [,z] [,pl][,pr] [,dif] [,m] [,info])

call tgsen(a, b, select [,alpha] [,beta] [,ijob] [,q] [,z] [,pl] [,pr] [, dif][,m] [,info])

Description

This routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrixpair (A, B) (in terms of an orthogonal/unitary equivalence transformation Q *(A, B) * Z),so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the pair (A,B). The leading columns of Q and Z form orthonormal/unitary bases of the corresponding leftand right eigenspaces (deflating subspaces).

(A, B) must be in generalized real-Schur/Schur canonical form (as returned by ?gges), that is,A and B are both upper triangular.

?tgsen also computes the generalized eigenvalues

ωj = (alphar(j) + alphai(j)*i)/beta(j) (for real flavors)

861


ωj = alpha(j)/beta(j) (for complex flavors)

of the reordered matrix pair (A, B).

Optionally, the routine computes the estimates of reciprocal condition numbers for eigenvaluesand eigenspaces. These are Difu[(A11, B11), (A22, B22)] and Difl[(A11, B11), (A22, B22)], that is,the separation(s) between the matrix pairs (A11, B11) and (A22, B22) that correspond to theselected cluster and the eigenvalues outside the cluster, respectively, and norms of "projections"onto left and right eigenspaces with respect to the selected cluster in the (1,1)-block.

Input Parameters

INTEGER. Specifies whether condition numbers are requiredfor the cluster of eigenvalues (pl and pr) or the deflatingsubspaces Difu and Difl.

ijob

If ijob =0, only reorder with respect to select;If ijob =1, reciprocal of norms of "projections" onto leftand right eigenspaces with respect to the selected cluster(pl and pr);If ijob =2, compute upper bounds on Difu and Difl, usingF-norm-based estimate (dif (1:2));If ijob =3, compute estimate of Difu and Difl, sing1-norm-based estimate (dif (1:2)). This option is about 5times as expensive as ijob =2;If ijob =4,>compute pl, pr and dif (i.e., options 0, 1 and2 above). This is an economic version to get it all;If ijob =5, compute pl, pr and dif (i.e., options 0, 1 and3 above).

LOGICAL.wantq, wantzIf wantq = .TRUE., update the left transformation matrixQ;If wantq = .FALSE., do not update Q;If wantz = .TRUE., update the right transformation matrixZ;If wantz = .FALSE., do not update Z.

LOGICAL.selectArray, DIMENSION at least max (1, n). Specifies theeigenvalues in the selected cluster.

862


To select an eigenvalue omega(j), select(j) must be.TRUE. For real flavors: to select a complex conjugate pairof eigenvalues omega(j) and omega(j+1) (corresponding 2by 2 diagonal block), select(j) and/or select(j+1) mustbe set to .TRUE.; the complex conjugate omega(j) andomega(j+1) must be either both included in the cluster orboth excluded.


REAL for stgsena, b, q, z, workDOUBLE PRECISION for dtgsenCOMPLEX for ctgsenDOUBLE COMPLEX for ztgsen.Arrays:a(lda,*) contains the matrix A.For real flavors: A is upper quasi-triangular, with (A, B) ingeneralized real Schur canonical form.For complex flavors: A is upper triangular, in generalizedSchur canonical form.The second dimension of a must be at least max(1, n).b(ldb,*) contains the matrix B.For real flavors: B is upper triangular, with (A, B) ingeneralized real Schur canonical form.For complex flavors: B is upper triangular, in generalizedSchur canonical form. The second dimension of b must beat least max(1, n).q (ldq,*)If wantq = .TRUE., then q is an n-by-n matrix;If wantq = .FALSE., then q is not referenced.The second dimension of q must be at least max(1, n).z (ldz,*)If wantz = .TRUE., then z is an n-by-n matrix;If wantz = .FALSE., then z is not referenced.The second dimension of z must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).If ijob=0, work is not referenced.


863



INTEGER. The first dimension of q; ldq ≥ 1.ldq

If wantq = .TRUE., then ldq ≥ max(1, n).

INTEGER. The first dimension of z; ldz ≥ 1.ldz

If wantz = .TRUE., then ldz ≥ max(1, n).

INTEGER. The dimension of the array work.lworkFor real flavors:

If ijob = 1, 2, or 4, lwork ≥ max(4n+16, 2m(n-m)).

If ijob = 3 or 5, lwork ≥ max(4n+16, 4m(n-m)).For complex flavors:

If ijob = 1, 2, or 4, lwork ≥ max(1, 2m(n-m)).

If ijob = 3 or 5, lwork ≥ max(1, 4m(n-m)).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

INTEGER. Workspace array, its dimension max(1, liwork).iworkIf ijob =0, iwork is not referenced.

INTEGER. The dimension of the array iwork.liworkFor real flavors:

If ijob = 1, 2, or 4, liwork ≥ n+6.

If ijob = 3 or 5, liwork ≥ max(n+6, 2m(n-m)).For complex flavors:

If ijob = 1, 2, or 4, liwork ≥ n+2.

If ijob = 3 or 5, liwork ≥ max(n+2, 2m(n-m)).If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the iwork array,returns this value as the first entry of the iwork array, andno error message related to liwork is issued by xerbla.See Application Notes for details.

864


Output Parameters

Overwritten by the reordered matrices A and B, respectively.a, b

REAL for stgsen;alphar, alphaiDOUBLE PRECISION for dtgsen.Arrays, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in real flavors.See beta.

COMPLEX for ctgsen;alphaDOUBLE COMPLEX for ztgsen.Array, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in complex flavors.See beta.

REAL for stgsenbetaDOUBLE PRECISION for dtgsenCOMPLEX for ctgsenDOUBLE COMPLEX for ztgsen.Array, DIMENSION at least max(1, n).For real flavors:On exit, (alphar(j) + alphai(j)*i)/beta(j), j=1,..., n, willbe the generalized eigenvalues.alphar(j) + alphai(j)*i and beta(j), j=1,..., n are thediagonals of the complex Schur form (S,T) that would resultif the 2-by-2 diagonal blocks of the real generalized Schurform of (A,B) were further reduced to triangular form usingcomplex unitary transformations.If alphai(j) is zero, then the j-th eigenvalue is real; ifpositive, then the j-th and (j+1)-st eigenvalues are acomplex conjugate pair, with alphai(j+1) negative.For complex flavors:The diagonal elements of A and B, respectively, when thepair (A,B) has been reduced to generalized Schur form.alpha(i)/beta(i), i=1,..., n are the generalized eigenvalues.

If wantq =.TRUE., then, on exit, Q has been postmultipliedby the left orthogonal transformation matrix which reorder(A, B). The leading m columns of Q form orthonormal basesfor the specified pair of left eigenspaces (deflatingsubspaces).

q

865


If wantz =.TRUE., then, on exit, Z has been postmultipliedby the left orthogonal transformation matrix which reorder(A, B). The leading m columns of Z form orthonormal basesfor the specified pair of left eigenspaces (deflatingsubspaces).

z

INTEGER.mThe dimension of the specified pair of left and right

eigen-spaces (deflating subspaces); 0 ≤ m ≤ n.

REAL for single precision flavors;pl, prDOUBLE PRECISION for double precision flavors.If ijob = 1, 4, or 5, pl and pr are lower bounds on thereciprocal of the norm of "projections" onto left and righteigenspaces with respect to the selected cluster.

0 < pl, pr ≤ 1. If m = 0 or m = n, pl = pr = 1.If ijob = 0, 2 or 3, pl and pr are not referenced

REAL for single precision flavors;DOUBLE PRECISION fordouble precision flavors.

dif

Array, DIMENSION (2).

If ijob ≥ 2, dif(1:2) store the estimates ofDifu and Difl.If ijob = 2 or 4, dif(1:2) are F-norm-based upper boundson Difu and Difl.If ijob = 3 or 5, dif(1:2) are 1-norm-based estimates ofDifu and Difl.If m = 0 or n, dif(1:2) = F-norm([A, B]).If ijob = 0 or 1, dif is not referenced.

If ijob is not 0 and info = 0, on exit, work(1) containsthe minimum value of lwork required for optimumperformance. Use this lwork for subsequent runs.

work(1)

If ijob is not 0 and info = 0, on exit, iwork(1) containsthe minimum value of liwork required for optimumperformance. Use this liwork for subsequent runs.

iwork(1)


866

If info = 1, Reordering of (A, B) failed because thetransformed matrix pair (A, B) would be too far fromgeneralized Schur form; the problem is very ill-conditioned.(A, B) may have been partially reordered.If requested, 0 is returned in dif(*), pl and pr.



Specific details for the routine tgsen interface are the following:










Holds the vector of length (2).dif

Must be 0, 1, 2, 3, 4, or 5. The default value is 0.ijob

Restored based on the presence of the argument q as follows:wantqwantq = .TRUE, if q is present,wantq = .FALSE, if q is omitted.

Restored based on the presence of the argument z as follows:wantzwantz = .TRUE, if z is present,wantz = .FALSE, if z is omitted.

Application Notes


867





?tgsylSolves the generalized Sylvester equation.

Syntax

Fortran 77:

call stgsyl(trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f, ldf,scale, dif, work, lwork, iwork, info)

call dtgsyl(trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f, ldf,scale, dif, work, lwork, iwork, info)

call ctgsyl(trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f, ldf,scale, dif, work, lwork, iwork, info)

call ztgsyl(trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f, ldf,scale, dif, work, lwork, iwork, info)

Fortran 95:

call tgsyl(a, b, c, d, e, f [,ijob] [,trans] [,scale] [,dif] [,info])

Description

This routine solves the generalized Sylvester equation:

A R - L B = scale * C

D R - L E = scale * F

868


where R and L are unknown m-by-n matrices, (A, D), (B, E) and (C, F) are given matrix pairs ofsize m-by-m, n-by-n and m-by-n, respectively, with real/complex entries. (A, D) and (B, E) mustbe in generalized real-Schur/Schur canonical form, that is, A, B are upperquasi-triangular/triangular and D, E are upper triangular.

The solution (R, L) overwrites (C, F). The factor scale, 0 ≤ scale ≤ 1, is an output scalingfactor chosen to avoid overflow.

In matrix notation the above equation is equivalent to the following: solve Zx = scale * b,where Z is defined as

Here Ik is the identity matrix of size k and X' is the transpose/conjugate-transpose of X. kron(X,Y) is the Kronecker product between the matrices X and Y.

If trans = 'T' (for real flavors), or trans = 'C' (for complex flavors), the routine ?tgsylsolves the transposed/conjugate-transposed system Z ' y = scale * b, which is equivalentto solve for R and L in

A' R + D' L = scale * C

R B' + L E' = scale * (-F)

This case (trans = 'T' for stgsyl/dtgsyl or trans = 'C' for ctgsyl/ztgsyl) is used tocompute an one-norm-based estimate of Dif[(A, D), (B, E)], the separation between the matrixpairs (A,D) and (B,E), using slacon/clacon.

If ijob < 1, ?tgsyl computes a Frobenius norm-based estimate of Dif[(A, D), (B,E)]. That is,the reciprocal of a lower bound on the reciprocal of the smallest singular value of Z. This is alevel 3 BLAS algorithm.

Input Parameters

CHARACTER*1. Must be 'N', 'T', or 'C'.transIf trans = 'N', solve the generalized Sylvester equation.If trans = 'T', solve the 'transposed' system (for realflavors only).

869

If trans = 'C', solve the ' conjugate transposed' system(for complex flavors only).

INTEGER. Specifies what kind of functionality to beperformed:

ijob

If ijob =0 , solve the generalized Sylvester equation only;If ijob =1, perform the functionality of ijob =0 and ijob=3;If ijob =2, perform the functionality of ijob =0 and ijob=4;If ijob =3, only an estimate of Dif[(A, D), (B, E)] iscomputed (look ahead strategy is used);If ijob =4, only an estimate of Dif[(A, D), (B,E)] iscomputed (?gecon on sub-systems is used). If trans ='T' or 'C', ijob is not referenced.

INTEGER. The order of the matrices A and D, and the rowdimension of the matrices C, F, R and L.

m

INTEGER. The order of the matrices B and E, and the columndimension of the matrices C, F, R and L.

n

REAL for stgsyla, b, c, d, e, f, workDOUBLE PRECISION for dtgsylCOMPLEX for ctgsylDOUBLE COMPLEX for ztgsyl.Arrays:a(lda,*) contains the upper quasi-triangular (for real flavors)or upper triangular (for complex flavors) matrix A.The second dimension of a must be at least max(1, m).b(ldb,*) contains the upper quasi-triangular (for real flavors)or upper triangular (for complex flavors) matrix B. Thesecond dimension of b must be at least max(1, n).c (ldc,*) contains the right-hand-side of the first matrixequation in the generalized Sylvester equation (as definedby trans)The second dimension of c must be at least max(1, n).d (ldd,*) contains the upper triangular matrix D.The second dimension of d must be at least max(1, m).e (lde,*) contains the upper triangular matrix E.The second dimension of e must be at least max(1, n).

870


f (ldf,*) contains the right-hand-side of the second matrixequation in the generalized Sylvester equation (as definedby trans)The second dimension of f must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).




INTEGER. The first dimension of d; at least max(1, m).ldd

INTEGER. The first dimension of e; at least max(1, n).lde

INTEGER. The first dimension of f; at least max(1, m).ldf

INTEGER.lwork

The dimension of the array work. lwork ≥ 1.

If ijob = 1 or 2 and trans = 'N', lwork ≥ max(1,2*m*n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

INTEGER. Workspace array, DIMENSION at least (m+n+6)for real flavors, and at least (m+n+2) for complex flavors.

iwork

If ijob=0, iwork is not referenced.

Output Parameters

If ijob=0, 1, or 2, overwritten by the solution R.cIf ijob=3 or 4 and trans = 'N', c holds R, the solutionachieved during the computation of the Dif-estimate.

If ijob=0, 1, or 2, overwritten by the solution L.fIf ijob=3 or 4 and trans = 'N', f holds L, the solutionachieved during the computation of the Dif-estimate.

REAL for single-precision flavorsdifDOUBLE PRECISION for double-precision flavors.

871


On exit, dif is the reciprocal of a lower bound of thereciprocal of the Dif-function, that is, dif is an upper boundof Dif[(A, D), (B, E)] = sigma_min(Z), where Z as in (2).If ijob = 0, or trans = 'T' (for real flavors), or trans= 'C' (for complex flavors), dif is not touched.

REAL for single-precision flavorsscaleDOUBLE PRECISION for double-precision flavors.On exit, scale is the scaling factor in the generalizedSylvester equation.If 0 < scale < 1, c and f hold the solutions R and L,respectively, to a slightly perturbed system but the inputmatrices A, B, D and E have not been changed.If scale = 0, c and f hold the solutions R and L,respectively, to the homogeneous system with C = F = 0.Normally, scale = 1.

If info = 0, work(1) contains the minimum value of lworkrequired for optimum performance. Use this lwork forsubsequent runs.

work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info > 0, (A, D) and (B, E) have common or closeeigenvalues.



Specific details for the routine tgsyl interface are the following:

Holds the matrix A of size (m,m).a



Holds the matrix D of size (m,m).d

Holds the matrix E of size (n,n).e

Holds the matrix F of size (m,n).f

872

Must be 0, 1, 2, 3, or 4. The default value is 0.ijob


Application Notes





?tgsnaEstimates reciprocal condition numbers for specifiedeigenvalues and/or eigenvectors of a pair ofmatrices in generalized real Schur canonical form.

Syntax

Fortran 77:

call stgsna(job, howmny, select, n, a, lda, b, ldb, vl, ldvl, vr, ldvr, s,dif, mm, m, work, lwork, iwork, info)

call dtgsna(job, howmny, select, n, a, lda, b, ldb, vl, ldvl, vr, ldvr, s,dif, mm, m, work, lwork, iwork, info)

call ctgsna(job, howmny, select, n, a, lda, b, ldb, vl, ldvl, vr, ldvr, s,dif, mm, m, work, lwork, iwork, info)

call ztgsna(job, howmny, select, n, a, lda, b, ldb, vl, ldvl, vr, ldvr, s,dif, mm, m, work, lwork, iwork, info)

873


Fortran 95:

call tgsna(a, b [,s] [,dif] [,vl] [,vr] [,select] [,m] [,info])

Description

The real flavors stgsna/dtgsna of this routine estimate reciprocal condition numbers forspecified eigenvalues and/or eigenvectors of a matrix pair (A, B) in generalized real Schurcanonical form (or of any matrix pair (Q*A*ZT, Q*B*ZT) with orthogonal matrices Q and Z.

(A, B) must be in generalized real Schur form (as returned by sgges/dgges), that is, A is blockupper triangular with 1-by-1 and 2-by-2 diagonal blocks. B is upper triangular.

The complex flavors ctgsna/ztgsna estimate reciprocal condition numbers for specifiedeigenvalues and/or eigenvectors of a matrix pair (A, B). (A, B) must be in generalized Schurcanonical form , that is, A and B are both upper triangular.

Input Parameters

CHARACTER*1. Specifies whether condition numbers arerequired for eigenvalues or eigenvectors. Must be 'E' or'V' or 'B'.

job

If job = 'E', for eigenvalues only (compute s ).If job = 'V', for eigenvectors only (compute dif ).If job = 'B', for both eigenvalues and eigenvectors(compute both s and dif).

CHARACTER*1. Must be 'A' or 'S'.howmnyIf howmny = 'A' , compute condition numbers for alleigenpairs.If howmny = 'S' , compute condition numbers for selectedeigenpairs specified by the logical array select.

LOGICAL.selectArray, DIMENSION at least max (1, n).If howmny = 'S', select specifies the eigenpairs for whichcondition numbers are required.If howmny = 'A', select is not referenced.For real flavors:To select condition numbers for the eigenpair correspondingto a real eigenvalue omega(j), select(j) must be set to.TRUE.; to select condition numbers corresponding to a

874


complex conjugate pair of eigenvalues omega(j) andomega(j+1), either select(j) or select(j+1) must be setto .TRUE.For complex flavors:To select condition numbers for the corresponding j-theigenvalue and/or eigenvector, select(j) must be set to.TRUE..

INTEGER. The order of the square matrix pair (A, B)n

(n ≥ 0).

REAL for stgsnaa, b, vl, vr, workDOUBLE PRECISION for dtgsnaCOMPLEX for ctgsnaDOUBLE COMPLEX for ztgsna.Arrays:a(lda,*) contains the upper quasi-triangular (for real flavors)or upper triangular (for complex flavors) matrix A in the pair(A, B).The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper triangular matrix B in the pair(A, B). The second dimension of b must be at least max(1,n).If job = 'E' or 'B', vl(ldvl,*) must contain lefteigenvectors of (A, B), corresponding to the eigenpairsspecified by howmny and select. The eigenvectors must bestored in consecutive columns of vl, as returned by ?tgevc.If job = 'V', vl is not referenced. The second dimensionof vl must be at least max(1, m).If job = 'E' or 'B', vr(ldvr,*) must contain righteigenvectors of (A, B), corresponding to the eigenpairsspecified by howmny and select. The eigenvectors must bestored in consecutive columns of vr, as returned by ?tgevc.If job = 'V', vr is not referenced. The second dimensionof vr must be at least max(1, m).work is a workspace array, its dimension max(1,lwork).If job = 'E', work is not referenced.


875



INTEGER. The first dimension of vl; ldvl ≥ 1.ldvl

If job = 'E' or 'B', then ldvl ≥ max(1, n).

INTEGER. The first dimension of vr; ldvr ≥ 1.ldvr

If job = 'E' or 'B', then ldvr ≥ max(1, n).

INTEGER. The number of elements in the arrays s and dif

(mm ≥ m).

mm


lwork ≥ max(1, n).

If job = 'V' or 'B', lwork ≥ 2*n*(n+2)+16 for real

flavors, and lwork ≥ max(1, 2*n*n) for complex flavors.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

INTEGER. Workspace array, DIMENSION at least (n+6) forreal flavors, and at least (n+2) for complex flavors.

iwork

If ijob = 'E', iwork is not referenced.

Output Parameters

REAL for single-precision flavorssDOUBLE PRECISION for double-precision flavors.Array, DIMENSION (mm ).If job = 'E' or 'B', contains the reciprocal conditionnumbers of the selected eigenvalues, stored in consecutiveelements of the array.If job = 'V', s is not referenced.For real flavors:For a complex conjugate pair of eigenvalues two consecutiveelements of s are set to the same value. Thus, s(j), dif(j),and the j-th columns of vl and vr all correspond to thesame eigenpair (but not in general the j-th eigenpair, unlessall eigenpairs are selected).

876


REAL for single-precision flavorsdifDOUBLE PRECISION for double-precision flavors.Array, DIMENSION (mm ).If job = 'V' or 'B', contains the estimated reciprocalcondition numbers of the selected eigenvectors, stored inconsecutive elements of the array.If the eigenvalues cannot be reordered to compute dif(j),dif(j) is set to 0; this can only occur when the true valuewould be very small anyway.If job = 'E', dif is not referenced.For real flavors:For a complex eigenvector, two consecutive elements ofdif are set to the same value.For complex flavors:For each eigenvalue/vector specified by select, dif storesa Frobenius norm-based estimate of Difl.

INTEGER. The number of elements in the arrays s and difused to store the specified condition numbers; for eachselected eigenvalue one element is used.

m

If howmny = 'A', m is set to n.

work(1)work(1)If job is not 'E' and info = 0, on exit, work(1) containsthe minimum value of lwork required for optimumperformance. Use this lwork for subsequent runs.




Specific details for the routine tgsna interface are the following:



Holds the vector of length (mm).s

877


Holds the vector of length (mm).dif




Restored based on the presence of the argument select as follows:howmny = 'S', if select is present, howmny = 'A', if select isomitted.

howmny

Restored based on the presence of arguments s and dif as follows:job = 'B', if both s and dif are present, job = 'E', if s is presentand dif omitted, job = 'V', if s is omitted and dif present, Note thatthere will be an error condition if both s and dif are omitted.

job

Application Notes





Generalized Singular Value Decomposition

This section describes LAPACK computational routines used for finding the generalized singularvalue decomposition (GSVD) of two matrices A and B as

UHAQ = D1 * (0 R),

VHBQ = D2 * (0 R),

where U, V, and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix,and D1, D2 are “diagonal” matrices of the structure detailed in the routines description section.

878


Table 4-7 lists LAPACK routines (Fortran-77 interface) that perform generalized singular valuedecomposition of matrices. Respective routine names in Fortran-95 interface are without thefirst symbol (see Routine Naming Conventions).

Table 4-7 Computational Routines for Generalized Singular Value Decomposition

Operation performedRoutine name

Computes the preprocessing decomposition for the generalizedSVD

?ggsvp

Computes the generalized SVD of two upper triangular ortrapezoidal matrices

?tgsja

You can use routines listed in the above table as well as the driver routine ?ggsvd to find theGSVD of a pair of general rectangular matrices.

?ggsvpComputes the preprocessing decomposition for thegeneralized SVD.

Syntax

Fortran 77:

call sggsvp(jobu, jobv, jobq, m, p, n, a, lda, b, ldb, tola, tolb, k, l, u,ldu, v, ldv, q, ldq, iwork, tau, work, info)

call dggsvp(jobu, jobv, jobq, m, p, n, a, lda, b, ldb, tola, tolb, k, l, u,ldu, v, ldv, q, ldq, iwork, tau, work, info)

call cggsvp (jobu, jobv, jobq, m, p, n, a, lda, b, ldb, tola, tolb, k, l, u,ldu, v, ldv, q, ldq, iwork, rwork, tau, work, info)

call zggsvp(jobu, jobv, jobq, m, p, n, a, lda, b, ldb, tola, tolb, k, l, u,ldu, v, ldv, q, ldq, iwork, rwork, tau, work, info)

Fortran 95:

call ggsvp(a, b, tola, tolb [, k] [,l] [,u] [,v] [,q] [,info])

Description

This routine computes orthogonal matrices U, V and Q such that

879


where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is

l-by-l upper triangular if m-k-l ≥0, otherwise A23 is (m-k)-by-l upper trapezoidal. The sumk+l is equal to the effective numerical rank of the (m+p)-by-n matrix (AH,BH)H.

This decomposition is the preprocessing step for computing the Generalized Singular ValueDecomposition (GSVD), see subroutine ?ggsvd.

Input Parameters

CHARACTER*1. Must be 'U' or 'N'.jobuIf jobu = 'U', orthogonal/unitary matrix U is computed.If jobu = 'N', U is not computed.

CHARACTER*1. Must be 'V' or 'N'.jobv

880


If jobv = 'V', orthogonal/unitary matrix V is computed.If jobv = 'N', V is not computed.

CHARACTER*1. Must be 'Q' or 'N'.jobqIf jobq = 'Q', orthogonal/unitary matrix Q is computed.If jobq = 'N', Q is not computed.


INTEGER. The number of rows of the matrix B (p ≥ 0).p


(n ≥ 0).

n

REAL for sggsvpa, b, tau, workDOUBLE PRECISION for dggsvpCOMPLEX for cggsvpDOUBLE COMPLEX for zggsvp.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the p-by-n matrix B.The second dimension of b must be at least max(1, n).tau(*) is a workspace array.The dimension of tau must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least max(1, 3n, m, p).



REAL for single-precision flavorstola, tolbDOUBLE PRECISION for double-precision flavors.tola and tolb are the thresholds to determine the effectivenumerical rank of matrix B and a subblock of A. Generally,they are set totola = max(m, n)*||A||*MACHEPS,tolb = max(p, n)*||B||*MACHEPS.The size of tola and tolb may affect the size of backwarderrors of the decomposition.

INTEGER. The first dimension of the output array u . ldu

≥ max(1, m) if jobu = 'U'; ldu ≥ 1 otherwise.

ldu

881


INTEGER. The first dimension of the output array v . ldv

≥ max(1, p) if jobv = 'V'; ldv ≥ 1 otherwise.

ldv

INTEGER. The first dimension of the output array q . ldq

≥ max(1, n) if jobq = 'Q'; ldq ≥ 1 otherwise.

ldq


REAL for cggsvprworkDOUBLE PRECISION for zggsvp.Workspace array, DIMENSION at least max(1, 2n). Used incomplex flavors only.

Output Parameters

Overwritten by the triangular (or trapezoidal) matrixdescribed in the Description section.

a

Overwritten by the triangular matrix described in theDescription section.

b

INTEGER. On exit, k and l specify the dimension ofsubblocks. The sum k + l is equal to effective numericalrank of (AH, BH)H.

k, l

REAL for sggsvpu, v, qDOUBLE PRECISION for dggsvpCOMPLEX for cggsvpDOUBLE COMPLEX for zggsvp.Arrays:If jobu = 'U', u(ldu,*) contains the orthogonal/unitarymatrix U.The second dimension of u must be at least max(1, m).If jobu = 'N', u is not referenced.If jobv = 'V', v(ldv,*) contains the orthogonal/unitarymatrix V.The second dimension of v must be at least max(1, m).If jobv = 'N', v is not referenced.If jobq = 'Q', q(ldq,*) contains the orthogonal/unitarymatrix Q.The second dimension of q must be at least max(1, n).If jobq = 'N', q is not referenced.

882





Specific details for the routine ggsvp interface are the following:


Holds the matrix B of size (p,n).b

Holds the matrix U of size (m,m).u

Holds the matrix V of size (p,m).v


Restored based on the presence of the argument u as follows:jobujobu = 'U', if u is present,jobu = 'N', if u is omitted.

Restored based on the presence of the argument v as follows:jobvjobz = 'V', if v is present,jobz = 'N', if v is omitted.

Restored based on the presence of the argument q as follows:jobqjobz = 'Q', if q is present,jobz = 'N', if q is omitted.

883


?tgsjaComputes the generalized SVD of two uppertriangular or trapezoidal matrices.

Syntax

Fortran 77:

call stgsja(jobu, jobv, jobq, m, p, n, k, l, a, lda, b, ldb, tola, tolb, alpha,beta, u, ldu, v, ldv, q, ldq, work, ncycle, info)

call dtgsja(jobu, jobv, jobq, m, p, n, k, l, a, lda, b, ldb, tola, tolb, alpha,beta, u, ldu, v, ldv, q, ldq, work, ncycle, info)

call ctgsja(jobu, jobv, jobq, m, p, n, k, l, a, lda, b, ldb, tola, tolb, alpha,beta, u, ldu, v, ldv, q, ldq, work, ncycle, info)

call ztgsja(jobu, jobv, jobq, m, p, n, k, l, a, lda, b, ldb, tola, tolb, alpha,beta, u, ldu, v, ldv, q, ldq, work, ncycle, info)

Fortran 95:

call tgsja(a, b, tola, tolb, k, l [,u] [,v] [,q] [,jobu] [,jobv] [,jobq][,alpha] [,beta] [,ncycle] [,info])

Description

This routine computes the generalized singular value decomposition (GSVD) of two real/complexupper triangular (or trapezoidal) matrices A and B. On entry, it is assumed that matrices A andB have the following forms, which may be obtained by the preprocessing subroutine ?ggsvpfrom a general m-by-n matrix A and p-by-n matrix B:

884


where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is

l-by-l upper triangular if m-k-l ≥0, otherwise A23 is (m-k)-by-l upper trapezoidal.

On exit,

UH A Q = D1*(0 R ), VH B Q = D2*(0 R ),

where U, V and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix,and D1 and D2 are “diagonal” matrices, which are of the following structures:

If m-k-l ≥0,

885


where

C = diag ( alpha(k+1),...,alpha(k+l))

S = diag ( beta(k+1),...,beta(k+l))

C2 + S2 = I

R is stored in a(1:k+l, n-k-l+1:n ) on exit.

If m-k-l < 0,

886

where

C = diag ( alpha(K+1),...,alpha(m)),

S = diag ( beta(K+1),...,beta(m)),

C2 + S2 = I

On exit, is stored in a(1:m, n-k-l+1:n ) and R33 is stored

in b(m-k+1:l, n+m-k-l+1:n ).

The computation of the orthogonal/unitary transformation matrices U, V or Q is optional. Thesematrices may either be formed explicitly, or they may be postmultiplied into input matrices U1,V1, or Q1.

Input Parameters

CHARACTER*1. Must be 'U', 'I', or 'N'.jobu

887


If jobu = 'U', u must contain an orthogonal/unitary matrixU1 on entry.If jobu = 'I', u is initialized to the unit matrix.If jobu = 'N', u is not computed.

CHARACTER*1. Must be 'V', 'I', or 'N'.jobvIf jobv = 'V', v must contain an orthogonal/unitary matrixV1 on entry.If jobv = 'I', v is initialized to the unit matrix.If jobv = 'N', v is not computed.

CHARACTER*1. Must be 'Q', 'I', or 'N'.jobqIf jobq = 'Q', q must contain an orthogonal/unitary matrixQ1 on entry.If jobq = 'I', q is initialized to the unit matrix.If jobq = 'N', q is not computed.




(n ≥ 0).

n

INTEGER. Specify the subblocks in the input matrices A andB, whose GSVD is going to be computed by ?tgsja.

k, l

REAL for stgsjaa,b,u,v,q,workDOUBLE PRECISION for dtgsjaCOMPLEX for ctgsjaDOUBLE COMPLEX for ztgsja.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the p-by-n matrix B.The second dimension of b must be at least max(1, n).If jobu = 'U', u(ldu,*) must contain a matrix U1 (usuallythe orthogonal/unitary matrix returned by ?ggsvp).The second dimension of u must be at least max(1, m).If jobv = 'V', v(ldv,*) must contain a matrix V1 (usuallythe orthogonal/unitary matrix returned by ?ggsvp).The second dimension of v must be at least max(1, p).

888


If jobq = 'Q', q(ldq,*) must contain a matrix Q1 (usuallythe orthogonal/unitary matrix returned by ?ggsvp).The second dimension of q must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least max(1, 2n).



INTEGER. The first dimension of the array u .ldu

ldu ≥ max(1, m) if jobu = 'U'; ldu ≥ 1 otherwise.

INTEGER. The first dimension of the array v .ldv

ldv ≥ max(1, p) if jobv = 'V'; ldv ≥ 1 otherwise.

INTEGER. The first dimension of the array q .ldq

ldq ≥ max(1, n) if jobq = 'Q'; ldq ≥ 1 otherwise.

REAL for single-precision flavorstola, tolbDOUBLE PRECISION for double-precision flavors.tola and tolb are the convergence criteria for theJacobi-Kogbetliantz iteration procedure. Generally, they arethe same as used in ?ggsvp:tola = max(m, n)*|A|*MACHEPS,tolb = max(p, n)*|B|*MACHEPS.

Output Parameters

On exit, a(n-k+1:n, 1:min(k+l, m)) contains the triangularmatrix R or part of R.

a

On exit, if necessary, b(m-k+1: l, n+m-k-l+1: n)) containsa part of R.

b

REAL for single-precision flavorsalpha, betaDOUBLE PRECISION for double-precision flavors.Arrays, DIMENSION at least max(1, n). Contain thegeneralized singular value pairs of A and B:alpha(1:k) = 1,beta(1:k) = 0,

and if m-k-l ≥ 0,alpha(k+1:k+l) = diag(C),

889


beta(k+1:k+l) = diag(S),or if m-k-l < 0,alpha(k+1:m)= C, alpha(m+1:k+l)=0beta(K+1:M) = S,beta(m+1:k+l) = 1.Furthermore, if k+l < n,alpha(k+l+1:n)= 0 andbeta(k+l+1:n) = 0.

If jobu = 'I', u contains the orthogonal/unitary matrix U.uIf jobu = 'U', u contains the product U1*U.If jobu = 'N', u is not referenced.

If jobv = 'I', v contains the orthogonal/unitary matrix U.vIf jobv = 'V', v contains the product V1*V.If jobv = 'N', v is not referenced.

If jobq = 'I', q contains the orthogonal/unitary matrix U.qIf jobq = 'Q', q contains the product Q1*Q.If jobq = 'N', q is not referenced.

INTEGER. The number of cycles required for convergence.ncycle

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, the procedure does not converge after MAXITcycles.



Specific details for the routine tgsja interface are the following:



Holds the matrix U of size (m,m).u

Holds the matrix V of size (p,p).v


Holds the vector of length (n).alpha

890


If omitted, this argument is restored based on the presence of argumentu as follows:

jobu

jobu = 'U', if u is present,jobu = 'N', if u is omitted.If present, jobu must be equal to 'I' or 'U' and the argument u mustalso be present.Note that there will be an error condition if jobu is present and uomitted.

If omitted, this argument is restored based on the presence of argumentv as follows:

jobv

jobv = 'V', if v is present,jobv = 'N', if v is omitted.If present, jobv must be equal to 'I' or 'V' and the argument v mustalso be present.Note that there will be an error condition if jobv is present and vomitted.

If omitted, this argument is restored based on the presence of argumentq as follows:

jobq

jobq = 'Q', if q is present,jobq = 'N', if q is omitted.If present, jobq must be equal to 'I' or 'Q' and the argument q mustalso be present.Note that there will be an error condition if jobq is present and qomitted.

Driver RoutinesEach of the LAPACK driver routines solves a complete problem. To arrive at the solution, driverroutines typically call a sequence of appropriate computational routines.

Driver routines are described in the following sections :

Linear Least Squares (LLS) Problems

Generalized LLS Problems

Symmetric Eigenproblems

Nonsymmetric Eigenproblems

891



Generalized Symmetric Definite Eigenproblems

Generalized Nonsymmetric Eigenproblems

Linear Least Squares (LLS) Problems

This section describes LAPACK driver routines used for solving linear least-squares problems.Table 4-8 lists all such routines for Fortran-77 interface. Respective routine names in Fortran-95interface are without the first symbol (see Routine Naming Conventions).

Table 4-8 Driver Routines for Solving LLS Problems

Operation performedRoutine Name

Uses QR or LQ factorization to solve a overdetermined or underdeterminedlinear system with full rank matrix.

?gels

Computes the minimum-norm solution to a linear least squares problemusing a complete orthogonal factorization of A.

?gelsy

Computes the minimum-norm solution to a linear least squares problemusing the singular value decomposition of A.

?gelss

Computes the minimum-norm solution to a linear least squares problemusing the singular value decomposition of A and a divide and conquermethod.

?gelsd

?gelsUses QR or LQ factorization to solve aoverdetermined or underdetermined linear systemwith full rank matrix.

Syntax

Fortran 77:

call sgels(trans, m, n, nrhs, a, lda, b, ldb, work, lwork, info)

call dgels(trans, m, n, nrhs, a, lda, b, ldb, work, lwork, info)

call cgels(trans, m, n, nrhs, a, lda, b, ldb, work, lwork, info)

call zgels(trans, m, n, nrhs, a, lda, b, ldb, work, lwork, info)

892


Fortran 95:

call gels(a, b [,trans] [,info])

Description

This routine solves overdetermined or underdetermined real/ complex linear systems involvingan m-by-n matrix A, or its transpose/ conjugate-transpose, using a QR or LQ factorization of A.It is assumed that A has full rank.

The following options are provided:

1. If trans = 'N' and m ≥ n: find the least squares solution of an overdetermined system,that is, solve the least squares problem

minimize || b - A x ||2

2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined systemA*X = B.

3. If trans = 'T' or 'C' and m ≥ n: find the minimum norm solution of an undeterminedsystem AH*X = B.

4. If trans = 'T' or 'C' and m < n: find the least squares solution of an overdeterminedsystem, that is, solve the least squares problem

minimize || b - AH x ||2

Several right hand side vectors b and solution vectors x can be handled in a single call; theyare stored as the columns of the m-by-nrhs right hand side matrix B and the n-by-nrh solutionmatrix X.

Input Parameters

CHARACTER*1. Must be 'N', 'T', or 'C'.transIf trans = 'N', the linear system involves matrix A;If trans = 'T', the linear system involves the transposedmatrix AT (for real flavors only);If trans = 'C', the linear system involves theconjugate-transposed matrix AH (for complex flavors only).


INTEGER. The number of columns of the matrix An

(n ≥ 0).

893

INTEGER. The number of right-hand sides; the number of

columns in B (nrhs ≥ 0).

nrhs

REAL for sgelsa, b, workDOUBLE PRECISION for dgelsCOMPLEX for cgelsDOUBLE COMPLEX for zgels.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the matrix B of right hand side vectors,stored columnwise; B is m-by-nrhs if trans = 'N' , orn-by-nrhs if trans = 'T' or 'C'.The second dimension of b must be at least max(1, nrhs).work is a workspace array, its dimension max(1, lwork).


INTEGER. The first dimension of b; must be at least max(1,m, n).

ldb

INTEGER. The size of the work array; must be at least min(m, n)+max(1, m, n, nrhs).

lwork


Output Parameters

On exit, overwritten by the factorization data as follows:a

if m ≥ n, array a contains the details of the QR factorizationof the matrix A as returned by ?geqrf;

if m ≥ n, array a contains the details of the LQ factorizationof the matrix A as returned by ?gelqf.

If info = 0, b overwritten by the solution vectors, storedcolumnwise:

b

894


if trans = 'N' and m ≥ n, rows 1 to n of b contain theleast squares solution vectors; the residual sum of squaresfor the solution in each column is given by the sum ofsquares of modulus of elements n+1 to m in that column;

if trans = 'N' and m ≥ n, rows 1 to n of b contain theminimum norm solution vectors;if trans = 'T' or 'C' and m < n, rows 1 to m of b containthe minimum norm solution vectors; if trans = 'T' or 'C'and m < n, rows 1 to m of b contain the least squaressolution vectors; the residual sum of squares for the solutionin each column is given by the sum of squares of modulusof elements m+1 to n in that column.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, the i-th diagonal element of the triangularfactor of A is zero, so that A does not have full rank; theleast squares solution could not be computed.



Specific details for the routine gels interface are the following:


Holds the matrix of size max(m,n)-by-nrhs.bIf trans = 'N', then, on entry, the size of b is m-by-nrhs,If trans = 'T', then, on entry, the size of b is n-by-nrhs,


895

Application Notes

For better performance, try using lwork = min (m, n)+max(1, m, n, nrhs)*blocksize,where blocksize is a machine-dependent value (typically, 16 to 64) required for optimumperformance of the blocked algorithm.





?gelsyComputes the minimum-norm solution to a linearleast squares problem using a complete orthogonalfactorization of A.

Syntax

Fortran 77:

call sgelsy(m, n, nrhs, a, lda, b, ldb, jpvt, rcond, rank, work, lwork, info)

call dgelsy(m, n, nrhs, a, lda, b, ldb, jpvt, rcond, rank, work, lwork, info)

call cgelsy(m, n, nrhs, a, lda, b, ldb, jpvt, rcond, rank, work, lwork, rwork,info)

call zgelsy(m, n, nrhs, a, lda, b, ldb, jpvt, rcond, rank, work, lwork, rwork,info)

Fortran 95:

call gelsy(a, b [,rank] [,jpvt] [,rcond] [,info])

896


Description

This routine computes the minimum-norm solution to a real/complex linear least squaresproblem:

minimize || b - A x ||2

using a complete orthogonal factorization of A. A is an m-by-n matrix which may be rank-deficient.Several right hand side vectors b and solution vectors x can be handled in a single call; theyare stored as the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solutionmatrix X.

The routine first computes a QR factorization with column pivoting:

with R11 defined as the largest leading submatrix whose estimated condition number is lessthan 1/rcond. The order of R11, rank, is the effective rank of A. Then, R22 is considered to benegligible, and R12 is annihilated by orthogonal/unitary transformations from the right, arrivingat the complete orthogonal factorization:

The minimum-norm solution is then

where Q1 consists of the first rank columns of Q. This routine is basically identical to theoriginal?gelsx except three differences:

897


• The call to the subroutine ?geqpf has been substituted by the call to the subroutine ?geqp3.This subroutine is a BLAS-3 version of the QR factorization with column pivoting.

• Matrix B (the right hand side) is updated with BLAS-3.

• The permutation of matrix B (the right hand side) is faster and more simple.

Input Parameters



(n ≥ 0).



nrhs

REAL for sgelsya, b, workDOUBLE PRECISION for dgelsyCOMPLEX for cgelsyDOUBLE COMPLEX for zgelsy.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the m-by-nrhs right hand side matrix B.The second dimension of b must be at least max(1, nrhs).work is a workspace array, its dimension max(1, lwork).



ldb

INTEGER.jpvtArray, DIMENSION at least max(1, n).

On entry, if jpvt(i)≠ 0, the i-th column of A is permutedto the front of A*P, otherwise the i-th column of A is a freecolumn.

REAL for single-precision flavorsrcondDOUBLE PRECISION for double-precision flavors.

898


rcond is used to determine the effective rank of A, which isdefined as the order of the largest leading triangularsubmatrix R11 in the QR factorization with pivoting of A,whose estimated condition number < 1/rcond.

INTEGER. The size of the work array.lworkIf lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for the suggested value of lwork.

REAL for cgelsy DOUBLE PRECISION for zgelsy.Workspace array, DIMENSION at least max(1, 2n). Used incomplex flavors only.

rwork

Output Parameters

On exit, overwritten by the details of the completeorthogonal factorization of A.

a

Overwritten by the n-by-nrhs solution matrix X.b

On exit, if jpvt(i)= k, then the ith column of AP was thek-th column of A.

jpvt

INTEGER. The effective rank of A, that is, the order of thesubmatrix R11. This is the same as the order of the submatrixT11 in the complete orthogonal factorization of A.

rank


If info = -i, the i-th parameter had an illegal value.



Specific details for the routine gelsy interface are the following:


899

Holds the matrix of size max(m,n)-by-nrhs. On entry, contains them-by-nrhs right hand side matrix B, On exit, overwritten by then-by-nrhs solution matrix X.

b

Holds the vector of length (n). Default value for this element is jpvt(i)= 0.

jpvt

Default value for this element is rcond = 100*EPSILON(1.0_WP).rcond

Application Notes

For real flavors:

The unblocked strategy requires that:

lwork ≥ max( mn+3n+1, 2*mn + nrhs ),

where mn = min( m, n ).

The block algorithm requires that:

lwork ≥ max( mn+2n+nb*(n+1), 2*mn+nb*nrhs ),

where nb is an upper bound on the blocksize returned by ilaenv for the routinessgeqp3/dgeqp3, stzrzf/dtzrzf, stzrqf/dtzrqf, sormqr/dormqr, and sormrz/dormrz.


The unblocked strategy requires that:

lwork ≥ mn + max( 2*mn, n+1, mn + nrhs ),

where mn = min( m, n ).

The block algorithm requires that:

lwork < mn + max(2*mn, nb*(n+1), mn+mn*nb, mn+ nb*nrhs ),

where nb is an upper bound on the blocksize returned by ilaenv for the routinescgeqp3/zgeqp3, ctzrzf/ztzrzf, ctzrqf/ztzrqf, cunmqr/zunmqr, and cunmrz/zunmrz.



900



?gelssComputes the minimum-norm solution to a linearleast squares problem using the singular valuedecomposition of A.

Syntax

Fortran 77:

call sgelss(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, info)

call dgelss(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, info)

call cgelss(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, rwork,info)

call zgelss(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, rwork,info)

Fortran 95:

call gelss(a, b [,rank] [,s] [,rcond] [,info])

Description

This routine computes the minimum norm solution to a real linear least squares problem:

minimize || b - Ax ||2

using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may berank-deficient. Several right hand side vectors b and solution vectors x can be handled in asingle call; they are stored as the columns of the m-by-nrhs right hand side matrix B and then-by-nrhs solution matrix X. The effective rank of A is determined by treating as zero thosesingular values which are less than rcond times the largest singular value.

901


Input Parameters



(n ≥ 0).

INTEGER. The number of right-hand sides; the number ofcolumns in B

nrhs

(nrhs ≥ 0).

REAL for sgelssa, b, workDOUBLE PRECISION for dgelssCOMPLEX for cgelssDOUBLE COMPLEX for zgelss.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the m-by-nrhs right hand side matrix B.The second dimension of b must be at least max(1, nrhs).work is a workspace array, its dimension max(1, lwork).



ldb

REAL for single-precision flavorsrcondDOUBLE PRECISION for double-precision flavors.rcond is used to determine the effective rank of A. Singular

values s(i) ≤ rcond *s(1) are treated as zero.If rcond <0, machine precision is used instead.

INTEGER. The size of the work array; lwork≥ 1.lwork


REAL for cgelssrworkDOUBLE PRECISION for zgelss.

902

Workspace array used in complex flavors only. DIMENSIONat least max(1, 5*min(m, n)).

Output Parameters

On exit, the first min(m, n) rows of A are overwritten withits right singular vectors, stored row-wise.

a


If m≥n and rank = n, the residual sum-of-squares for thesolution in the i-th column is given by the sum of squaresof modulus of elements n+1:m in that column.

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.Array, DIMENSION at least max(1, min(m, n)). The singularvalues of A in decreasing order. The condition number of Ain the 2-norm isk2(A) = s(1)/ s(min(m, n)) .

INTEGER. The effective rank of A, that is, the number ofsingular values which are greater than rcond *s(1).

rank

If info = 0, on exit, work(1) contains the minimum valueof lwork required for optimum performance. Use this lworkfor subsequent runs.

work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then the algorithm for computing the SVDfailed to converge; i indicates the number of off-diagonalelements of an intermediate bidiagonal form which did notconverge to zero.



Specific details for the routine gelss interface are the following:


903



b

Holds the vector of length min(m,n).s


Application Notes

For real flavors:

lwork ≥ 3*min(m, n)+ max( 2*min(m, n), max(m, n), nrhs)


lwork ≥ 2*min(m, n)+ max(m, n, nrhs)

For good performance, lwork should generally be larger.





904


?gelsdComputes the minimum-norm solution to a linearleast squares problem using the singular valuedecomposition of A and a divide and conquermethod.

Syntax

Fortran 77:

call sgelsd(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, iwork,info)

call dgelsd(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, iwork,info)

call cgelsd(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, rwork,iwork, info)

call zgelsd(m, n, nrhs, a, lda, b, ldb, s, rcond, rank, work, lwork, rwork,iwork, info)

Fortran 95:

call gelsd(a, b [,rank] [,s] [,rcond] [,info])

Description

This routine computes the minimum-norm solution to a real linear least squares problem:

minimize || b - Ax ||2

using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may berank-deficient.

Several right hand side vectors b and solution vectors x can be handled in a single call; theyare stored as the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solutionmatrix X.

The problem is solved in three steps:

1. Reduce the coefficient matrix A to bidiagonal form with Householder transformations, reducingthe original problem into a "bidiagonal least squares problem" (BLS).

2. Solve the BLS using a divide and conquer approach.

3. Apply back all the Householder transformations to solve the original least squares problem.

905


The effective rank of A is determined by treating as zero those singular values which are lessthan rcond times the largest singular value.

The routine uses auxiliary routines ?lals0 and ?lalsa.

Input Parameters



(n ≥ 0).



nrhs

REAL for sgelsda, b, workDOUBLE PRECISION for dgelsdCOMPLEX for cgelsdDOUBLE COMPLEX for zgelsd.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the m-by-nrhs right hand side matrix B.The second dimension of b must be at least max(1, nrhs).work is a workspace array, its dimension max(1, lwork).



ldb

REAL for single-precision flavorsrcondDOUBLE PRECISION for double-precision flavors.rcond is used to determine the effective rank of A. Singular

values s(i) ≤ rcond *s(1) are treated as zero. If rcond

≤ 0, machine precision is used instead.

INTEGER. The size of the work array; lwork ≥ 1.lwork

If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the array workand the minimum sizes of the arrays rwork and iwork, and

906


returns these values as the first entries of the work, rworkand iwork arrays, and no error message related to lworkis issued by xerbla.See Application Notes for the suggested value of lwork.

INTEGER. Workspace array. See Application Notes for thesuggested dimension of iwork.

iwork

REAL for cgelsdrworkDOUBLE PRECISION for zgelsd.Workspace array, used in complex flavors only. SeeApplication Notes for the suggested dimension of rwork.

Output Parameters

On exit, A has been overwritten.a


If m ≥ n and rank = n, the residual sum-of-squares forthe solution in the i-th column is given by the sum ofsquares of modulus of elements n+1:m in that column.

REAL for single precision flavorssDOUBLE PRECISION for double precision flavors.Array, DIMENSION at least max(1, min(m, n)). The singularvalues of A in decreasing order. The condition number of Ain the 2-norm isk2(A) = s(1)/ s(min(m, n)).

INTEGER. The effective rank of A, that is, the number ofsingular values which are greater than rcond *s(1).

rank


work(1)

If info = 0, on exit, rwork(1) returns the minimum sizeof the workspace array iwork required for optimumperformance.

rwork(1)

If info = 0, on exit, iwork(1) returns the minimum sizeof the workspace array iwork required for optimumperformance.

iwork(1)

INTEGER.info

907


If info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then the algorithm for computing the SVDfailed to converge; i indicates the number of off-diagonalelements of an intermediate bidiagonal form that did notconverge to zero.



Specific details for the routine gelsd interface are the following:



b

Holds the vector of length min(m,n).s


Application Notes

The divide and conquer algorithm makes very mild assumptions about floating point arithmetic.It will work on machines with a guard digit in add/subtract. It could conceivably fail onhexadecimal or decimal machines without guard digits, but we know of none.

The exact minimum amount of workspace needed depends on m, n and nrhs. The size lworkof the workspace array work must be as given below.

For real flavors:

If m ≥ n,

lwork ≥ 12n + 2n*smlsiz + 8n*nlvl + n*nrhs + (smlsiz+1)2;

If m < n,

lwork ≥ 12m + 2m*smlsiz + 8m*nlvl + m*nrhs + (smlsiz+1)2;


If m ≥ n,

908

lwork< 2n + n*nrhs;

If m < n,

lwork ≥ 2m + m*nrhs;

where smlsiz is returned by ilaenv and is equal to the maximum size of the subproblems atthe bottom of the computation tree (usually about 25), and

nlvl = INT( log2( min( m, n )/(smlsiz+1)) ) + 1.






The dimension of the workspace array iwork must be at least

3*min( m, n )*nlvl + 11*min( m, n ).

The dimension of the workspace array iwork (for complex flavors) must be at least max(1,lrwork).

lrwork ≥ 10n + 2n*smlsiz + 8n*nlvl + 3*smlsiz*nrhs + (smlsiz+1)2 if m ≥ n, and

lrwork ≥ 10m + 2m*smlsiz + 8m*nlvl + 3*smlsiz*nrhs + (smlsiz+1)2 if m ≥ n.

Generalized LLS Problems

This section describes LAPACK driver routines used for solving generalized linear least-squaresproblems. Table 4-9 lists all such routines for Fortran-77 interface. Respective routine namesin Fortran-95 interface are without the first symbol (see Routine Naming Conventions).

909

Table 4-9 Driver Routines for Solving Generalized LLS Problems


Solves the linear equality-constrained least squares problem using ageneralized RQ factorization.

?gglse

Solves a general Gauss-Markov linear model problem using a generalizedQR factorization.

?ggglm

?gglseSolves the linear equality-constrained least squaresproblem using a generalized RQ factorization.

Syntax

Fortran 77:

call sgglse(m, n, p, a, lda, b, ldb, c, d, x, work, lwork, info)

call dgglse(m, n, p, a, lda, b, ldb, c, d, x, work, lwork, info)

call cgglse(m, n, p, a, lda, b, ldb, c, d, x, work, lwork, info)

call zgglse(m, n, p, a, lda, b, ldb, c, d, x, work, lwork, info)

Fortran 95:

call gglse(a, b, c, d, x [,info])

Description

This routine solves the linear equality-constrained least squares (LSE) problem:

minimize || c - A x ||2 subject to B x = d

where A is an m-by-n matrix, B is a p-by-n matrix, c is a given m-vector, and d is a given p-vector.

It is assumed that p ≤ n ≤ m+p, and

910


These conditions ensure that the LSE problem has a unique solution, which is obtained usinga generalized RQ factorization of the matrices (B, A) given by

B=(0 R)*Q, A=Z*T*Q

Input Parameters



(n ≥ 0).

n

INTEGER. The number of rows of the matrix Bp

(0 ≤ p ≤ n ≤ m+p).

REAL for sgglsea, b, c, d, workDOUBLE PRECISION for dgglseCOMPLEX for cgglseDOUBLE COMPLEX for zgglse.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the p-by-n matrix B.The second dimension of b must be at least max(1, n).c(*), dimension at least max(1, m), contains the right handside vector for the least squares part of the LSE problem.d(*), dimension at least max(1, p), contains the right handside vector for the constrained equation.work is a workspace array, its dimension max(1, lwork).




lwork ≥ max(1, m+n+p).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

911


Output Parameters

REAL for sgglsexDOUBLE PRECISION for dgglseCOMPLEX for cgglseDOUBLE COMPLEX for zgglse.Array, DIMENSION at least max(1, n). On exit, contains thesolution of the LSE problem.

On exit, the elements on and above the diagonal of thearray contain the min(m,n)-by-n upper trapezoidal matrixT.

a

On exit, the upper triangle of the subarray b(1:p,n-p+1:n) contains the p-by-p upper triangular matrix R.

b

On exit, d is destroyed.d

On exit, the residual sum-of-squares for the solution is givenby the sum of squares of elements n-p+1 to m of vector c.

c


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, the upper triangular factor R associated withB in the generalized RQ factorization of the pair (B, A) issingular, so that rank(B) < P; the least squares solutioncould not be computed.If info = 2, the (n-p)-by-(n-p) part of the uppertrapezoidal factor T associated with A in the generalized RQfactorization of the pair (B, A) is singular, so that

; the least squares solution could not be computed.

912



Specific details for the routine gglse interface are the following:



Holds the vector of length (m).c

Holds the vector of length (p).d


Application Notes

For optimum performance, use

lwork ≥ p + min(m, n) + max(m, n)*nb,

where nb is an upper bound for the optimal blocksizes for ?geqrf, ?gerqf, ?ormqr/?unmqrand ?ormrq/?unmrq.

You may set lwork to -1. The routine returns immediately and provides the recommendedworkspace in the first element of the corresponding array (work). This operation is called aworkspace query.


913


?ggglmSolves a general Gauss-Markov linear modelproblem using a generalized QR factorization.

Syntax

Fortran 77:

call sggglm(n, m, p, a, lda, b, ldb, d, x, y, work, lwork, info)

call dggglm(n, m, p, a, lda, b, ldb, d, x, y, work, lwork, info)

call cggglm(n, m, p, a, lda, b, ldb, d, x, y, work, lwork, info)

call zggglm(n, m, p, a, lda, b, ldb, d, x, y, work, lwork, info)

Fortran 95:

call ggglm(a, b, d, x, y [,info])

Description

This routine solves a general Gauss-Markov linear model (GLM) problem:

minimizex || y ||2 subject to d = Ax + By

where A is an n-by-m matrix, B is an n-by-p matrix, and d is a given n-vector. It is assumed

that m ≤ n ≤ m+p, and rank(A) = m and rank( A B ) = n.

Under these assumptions, the constrained equation is always consistent, and there is a uniquesolution x and a minimal 2-norm solution y, which is obtained using a generalized QR factorizationof the matrices (A, B ) given by

In particular, if matrix B is square nonsingular, then the problem GLM is equivalent to thefollowing weighted linear least squares problem

minimizex || B-1(d-Ax)||2.

914


Input Parameters

INTEGER. The number of rows of the matrices A and B (n

≥ 0).

n

INTEGER. The number of columns in A (m ≥ 0).m

INTEGER. The number of columns in B (p ≥ n - m).p

REAL for sggglma, b, d, workDOUBLE PRECISION for dggglmCOMPLEX for cggglmDOUBLE COMPLEX for zggglm.Arrays:a(lda,*) contains the n-by-m matrix A.The second dimension of a must be at least max(1, m).b(ldb,*) contains the n-by-p matrix B.The second dimension of b must be at least max(1, p).d(*), dimension at least max(1, n), contains the left handside of the GLM equation.work is a workspace array, its dimension max(1, lwork).



INTEGER. The size of the work array; lwork ≥ max(1,n+m+p).

lwork


Output Parameters

REAL for sggglmx, yDOUBLE PRECISION for dggglmCOMPLEX for cggglmDOUBLE COMPLEX for zggglm.Arrays x(*), y(*). DIMENSION at least max(1, m) for x andat least max(1, p) for y.

915


On exit, x and y are the solutions of the GLM problem.

On exit, the upper triangular part of the array a containsthe m-by-m upper triangular matrix R.

a

On exit, if n ≤ p, the upper triangle of the subarrayb(1:n,p-n+1:p) contains the n-by-n upper triangularmatrix T; if n > p, the elements on and above the (n-p)-thsubdiagonal contain the n-by-p upper trapezoidal matrix T.

b

On exit, d is destroyedd

If info = 0, on exit, work(1) contains the minimum valueof lwork required for optimum performance.

work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, the upper triangular factor R associated withA in the generalized QR factorization of the pair (A, B) issingular, so that rank(A) < m; the least squares solutioncould not be computed.If info = 2, the bottom (n-m)-by-(n-m) part of the uppertrapezoidal factor T associated with B in the generalized QRfactorization of the pair (A, B) is singular, so that rank(AB) < n; the least squares solution could not be computed.



Specific details for the routine ggglm interface are the following:

Holds the matrix A of size (n,m).a

Holds the matrix B of size (n,p).b



Holds the vector of length (p).y

916

Application Notes

For optimum performance, use

lwork ≥ m + min(n, p) + max(n, p)*nb,

where nb is an upper bound for the optimal blocksizes for ?geqrf, ?gerqf, ?ormqr/?unmqrand ?ormrq/?unmrq.

You may set lwork to -1. The routine returns immediately and provides the recommendedworkspace in the first element of the corresponding array (work). This operation is called aworkspace query.



This section describes LAPACK driver routines used for solving symmetric eigenvalue problems.See also computational routines computational routines that can be called to solve theseproblems. Table 4-10 lists all such driver routines for Fortran-77 interface. Respective routinenames in Fortran-95 interface are without the first symbol (see Routine Naming Conventions).

Table 4-10 Driver Routines for Solving Symmetric Eigenproblems


Computes all eigenvalues and, optionally, eigenvectors of a real symmetric/ Hermitian matrix.

?syev/?heev

Computes all eigenvalues and (optionally) all eigenvectors of a realsymmetric / Hermitian matrix using divide and conquer algorithm.

?syevd/?heevd

Computes selected eigenvalues and, optionally, eigenvectors of asymmetric / Hermitian matrix.

?syevx/?heevx

Computes selected eigenvalues and, optionally, eigenvectors of a realsymmetric / Hermitian matrix using the Relatively Robust Representations.

?syevr/?heevr

Computes all eigenvalues and, optionally, eigenvectors of a real symmetric/ Hermitian matrix in packed storage.

?spev/?hpev

917



Uses divide and conquer algorithm to compute all eigenvalues and(optionally) all eigenvectors of a real symmetric / Hermitian matrix heldin packed storage.

?spevd/?hpevd

Computes selected eigenvalues and, optionally, eigenvectors of a realsymmetric / Hermitian matrix in packed storage.

?spevx/?hpevx

Computes all eigenvalues and, optionally, eigenvectors of a real symmetric/ Hermitian band matrix.

?sbev /?hbev

Computes all eigenvalues and (optionally) all eigenvectors of a realsymmetric / Hermitian band matrix using divide and conquer algorithm.

?sbevd/?hbevd

Computes selected eigenvalues and, optionally, eigenvectors of a realsymmetric / Hermitian band matrix.

?sbevx/?hbevx

Computes all eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix.

?stev

Computes all eigenvalues and (optionally) all eigenvectors of a realsymmetric tridiagonal matrix using divide and conquer algorithm.

?stevd

Computes selected eigenvalues and eigenvectors of a real symmetrictridiagonal matrix.

?stevx

Computes selected eigenvalues and, optionally, eigenvectors of a realsymmetric tridiagonal matrix using the Relatively Robust Representations.

?stevr

?syevComputes all eigenvalues and, optionally,eigenvectors of a real symmetric matrix.

Syntax

Fortran 77:

call ssyev(jobz, uplo, n, a, lda, w, work, lwork, info)

call dsyev(jobz, uplo, n, a, lda, w, work, lwork, info)

918


Fortran 95:

call syev(a, w [,jobz] [,uplo] [,info])

Description

This routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrixA.

Note that for most cases of real symmetric eigenvalue problems the default choice should be?syevr function as its underlying algorithm is faster and uses less workspace.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz = 'N', then only eigenvalues are computed.If jobz = 'V', then eigenvalues and eigenvectors arecomputed.



REAL for ssyeva, workDOUBLE PRECISION for dsyevArrays:a(lda,*) is an array containing either upper or lowertriangular part of the symmetric matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of the array a.ldaMust be at least max(1, n).

INTEGER.lworkThe dimension of the array work.

Constraint: lwork ≥ max(1, 3n-1).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

919


See Application Notes for the suggested value of lwork.

Output Parameters

On exit, if jobz = 'V', then if info = 0, array a containsthe orthonormal eigenvectors of the matrix A.

a

If jobz = 'N', then on exit the lower triangle(if uplo = 'L') or the upper triangle (if uplo = 'U') ofA, including the diagonal, is overwritten.

REAL for ssyevwDOUBLE PRECISION for dsyevArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order.

On exit, if lwork > 0, then work(1) returns the requiredminimal size of lwork.

work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then the algorithm failed to converge; iindicates the number of elements of an intermediatetridiagonal form which did not converge to zero.



Specific details for the routine syev interface are the following:



Must be 'N' or 'V'. The default value is 'N'.job


Application Notes

For optimum performance use

920


lwork ≥ (nb+2)*n,

where nb is the blocksize for ?sytrd returned by ilaenv.





?heevComputes all eigenvalues and, optionally,eigenvectors of a Hermitian matrix.

Syntax

Fortran 77:

call cheev(jobz, uplo, n, a, lda, w, work, lwork, rwork, info)

call zheev(jobz, uplo, n, a, lda, w, work, lwork, rwork, info)

Fortran 95:

call heev(a, w [,jobz] [,uplo] [,info])

Description

This routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A.

Note that for most cases of complex Hermitian eigenvalue problems the default choice shouldbe ?heevr function as its underlying algorithm is faster and uses less workspace.

921


Input Parameters




COMPLEX for cheeva, workDOUBLE COMPLEX for zheevArrays:a(lda,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of the array a. Must be atleast max(1, n).

lda

INTEGER.lworkThe dimension of the array work. C

onstraint: lwork ≥ max(1, 2n-1).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for cheevrworkDOUBLE PRECISION for zheev.Workspace array, DIMENSION at least max(1, 3n-2).

Output Parameters

On exit, if jobz = 'V', then if info = 0, array a containsthe orthonormal eigenvectors of the matrix A.

a

If jobz = 'N', then on exit the lower triangle

922


(if uplo = 'L') or the upper triangle (if uplo = 'U') ofA, including the diagonal, is overwritten.

REAL for cheevwDOUBLE PRECISION for zheevArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order.


work(1)




Specific details for the routine heev interface are the following:



Must be 'N' or 'V'. The default value is 'N'.job


Application Notes

For optimum performance use

lwork ≥ (nb+1)*n,

where nb is the blocksize for ?hetrd returned by ilaenv.


923





?syevdComputes all eigenvalues and (optionally) alleigenvectors of a real symmetric matrix usingdivide and conquer algorithm.

Syntax

Fortran 77:

call ssyevd(jobz, uplo, n, a, lda, w, work, lwork, iwork, liwork, info)

call dsyevd(jobz, uplo, n, a, lda, w, work, lwork, iwork, liwork, info)

Fortran 95:

call syevd(a, w [,jobz] [,uplo] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric

matrix A. In other words, it can compute the spectral factorization of A as: A = ZΛZT.

Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is theorthogonal matrix whose columns are the eigenvectors zi. Thus,

Azi = λizi for i = 1, 2, ..., n.

If the eigenvectors are requested, then this routine uses a divide and conquer algorithm tocompute eigenvalues and eigenvectors. However, if only eigenvalues are required, then it usesthe Pal-Walker-Kahan variant of the QL or QR algorithm.

924


Note that for most cases of real symmetric eigenvalue problems the default choice should be?syevr function as its underlying algorithm is faster and uses less workspace. ?syevd requiresmore workspace but is faster in some cases, especially for large matrices.

Input Parameters




REAL for ssyevdaDOUBLE PRECISION for dsyevdArray, DIMENSION (lda, *).a(lda,*) is an array containing either upper or lowertriangular part of the symmetric matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).


REAL for ssyevdworkDOUBLE PRECISION for dsyevd.Workspace array, DIMENSION at least lwork.

INTEGER.lworkThe dimension of the array work.Constraints:

if n ≤ 1, then lwork ≥ 1;

if jobz = 'N' and n > 1, then lwork ≥ 2*n + 1;

if jobz = 'V' and n > 1, then lwork ≥ 2*n2+ 6*n + 1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of the

925


work and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

INTEGER.iworkWorkspace array, its dimension max(1, liwork).

INTEGER.liworkThe dimension of the array iwork.Constraints:

if n ≤ 1, then liwork ≥ 1;

if jobz = 'N' and n > 1, then liwork ≥ 1;

if jobz = 'V' and n > 1, then liwork ≥ 5*n + 3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters

REAL for ssyevdwDOUBLE PRECISION for dsyevdArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.

If jobz = 'V', then on exit this array is overwritten by theorthogonal matrix Z which contains the eigenvectors of A.

a


work(1)

On exit, if liwork > 0, then iwork(1) returns the requiredminimal size of liwork.

iwork(1)


926


If info = i, then the algorithm failed to converge; iindicates the number of off-diagonal elements of anintermediate tridiagonal form which did not converge tozero.If info = i, and jobz = 'V', then the algorithm failed tocompute an eigenvalue while working on the submatrix lyingin rows and columns info/(n+1) through mod(info,n+1).If info = -i, the i-th parameter had an illegal value.



Specific details for the routine syevd interface are the following:



Must be 'N' or 'V'. The default value is 'N'.jobz


Application Notes






927



The complex analogue of this routine is ?heevd

?heevdComputes all eigenvalues and (optionally) alleigenvectors of a complex Hermitian matrix usingdivide and conquer algorithm.

Syntax

Fortran 77:

call cheevd(jobz, uplo, n, a, lda, w, work, lwork, rwork, lrwork, iwork,liwork, info)

call zheevd(jobz, uplo, n, a, lda, w, work, lwork, rwork, lrwork, iwork,liwork, info)

Fortran 95:

call heevd(a, w [,job] [,uplo] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex

Hermitian matrix A. In other words, it can compute the spectral factorization of A as: A = ZΛZH.

Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the(complex) unitary matrix whose columns are the eigenvectors zi. Thus,

Azi = λizi for i = 1, 2, ..., n.


Note that for most cases of complex Hermetian eigenvalue problems the default choice shouldbe ?heevr function as its underlying algorithm is faster and uses less workspace. ?heevdrequires more workspace but is faster in some cases, especially for large matrices.

928


Input Parameters




COMPLEX for cheevdaDOUBLE COMPLEX for zheevdArray, DIMENSION (lda, *).a(lda,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).


lda

COMPLEX for cheevdworkDOUBLE COMPLEX for zheevd.Workspace array, DIMENSION max(1, lwork).

INTEGER.lworkThe dimension of the array work. Constraints:


if jobz = 'N' and n > 1, then lwork ≥ n+1;

if jobz = 'V' and n > 1, then lwork ≥ n2+2*n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for cheevdrworkDOUBLE PRECISION for zheevd

929


Workspace array, DIMENSION at least lrwork.

INTEGER.lrworkThe dimension of the array rwork. Constraints:

if n ≤ 1, then lrwork ≥ 1;

if job = 'N' and n > 1, then lrwork ≥ n;

if job = 'V' and n > 1, then lrwork ≥ 2*n2+ 5*n + 1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.


INTEGER.liwork

The dimension of the array iwork. Constraints: if n ≤ 1, then

liwork ≥ 1;


if jobz = 'V' and n > 1, then liwork ≥ 5*n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

Output Parameters

REAL for cheevdwDOUBLE PRECISION for zheevdArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.

If jobz = 'V', then on exit this array is overwritten by theunitary matrix Z which contains the eigenvectors of A.

a

930


On exit, if lwork > 0, then the real part of work(1) returnsthe required minimal size of lwork.

work(1)

On exit, if lrwork > 0, then rwork(1) returns the requiredminimal size of lrwork.

rwork(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = i, and jobz = 'N', then the algorithm failed toconverge; i off-diagonal elements of an intermediatetridiagonal form did not converge to zero;if info = i, and jobz = 'V', then the algorithm failed tocompute an eigenvalue while working on the submatrix lyingin rows and columns info/(n+1) through mod(info, n+1).If info = -i, the i-th parameter had an illegal value.



Specific details for the routine heevd interface are the following:





Application Notes

The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 =

O(ε) ||A||2, where ε is the machine precision.

If you are in doubt how much workspace to supply, use a generous value of lwork (liwork orlrwork) for the first run or set lwork = -1 (liwork = -1, lrwork = -1).

931


If you choose the first option and set any of admissible lwork (liwork or lrwork) sizes, whichis no less than the minimal value described, the routine completes the task, though probablynot so fast as with a recommended workspace, and provides the recommended workspace inthe first element of the corresponding array (work, iwork, rwork) on exit. Use this value(work(1), iwork(1), rwork(1)) for subsequent runs.

If you set lwork = -1 (liwork = -1, lrwork = -1), the routine returns immediately andprovides the recommended workspace in the first element of the corresponding array (work,iwork, rwork). This operation is called a workspace query.


The real analogue of this routine is ?syevd. See also ?hpevd for matrices held in packed storage,and ?hbevd for banded matrices.

?syevxComputes selected eigenvalues and, optionally,eigenvectors of a symmetric matrix.

Syntax

Fortran 77:

call ssyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, work, lwork, iwork, ifail, info)

call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, work, lwork, iwork, ifail, info)

Fortran 95:

call syevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetricmatrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

932


Note that for most cases of real symmetric eigenvalue problems the default choice should be?syevr function as its underlying algorithm is faster and uses less workspace. ?syevx is fasterfor a few selected eigenvalues.

Input Parameters


CHARACTER*1. Must be 'A', 'V', or 'I'.rangeIf range = 'A', all eigenvalues will be found.If range = 'V', all eigenvalues in the half-open interval(vl, vu] will be found.If range = 'I', the eigenvalues with indices il throughiu will be found.



REAL for ssyevxa, workDOUBLE PRECISION for dsyevx.Arrays:a(lda,*) is an array containing either upper or lowertriangular part of the symmetric matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

REAL for ssyevxvl, vuDOUBLE PRECISION for dsyevx.If range = 'V', the lower and upper bounds of the interval

to be searched for eigenvalues; vl ≤ vu. Not referenced ifrange = 'A'or 'I'.

INTEGER.il, iu

933


If range = 'I', the indices of the smallest and largesteigenvalues to be returned.

Constraints: 1 ≤ il ≤ iu ≤ n, if n > 0;il = 1 and iu = 0 , if n = 0.Not referenced if range = 'A'or 'V'.

REAL for ssyevxabstolDOUBLE PRECISION for dsyevx.The absolute error tolerance for the eigenvalues. SeeApplication Notes for more information.

INTEGER. The first dimension of the output array z; ldz ≥1.

ldz

If jobz = 'V', then ldz ≥ max(1, n).


If n ≤ 1 then lwork ≥ 1, otherwise lwork=8*n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

INTEGER. Workspace array, DIMENSION at least max(1, 5n).iwork

Output Parameters

On exit, the lower triangle (if uplo = 'L') or the uppertriangle (if uplo = 'U') of A, including the diagonal, isoverwritten.

a

INTEGER. The total number of eigenvalues found;m


REAL for ssyevxwDOUBLE PRECISION for dsyevxArray, DIMENSION at least max(1, n). The first m elementscontain the selected eigenvalues of the matrix A in ascendingorder.

934


REAL for ssyevxzDOUBLE PRECISION for dsyevx.Array z(ldz,*) contains eigenvectors.The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


work(1)

INTEGER.ifailArray, DIMENSION at least max(1, n).If jobz = 'V', then if info = 0, the first m elements ofifail are zero; if info > 0, then ifail contains the indicesof the eigenvectors that failed to converge.If jobz = 'V', then ifail is not referenced.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then i eigenvectors failed to converge; theirindices are stored in the array ifail.



Specific details for the routine syevx interface are the following:


935




Holds the vector of length (n).ifail


Default value for this element is vl = -HUGE(vl).vl

Default value for this element is vu = HUGE(vl).vu



Default value for this element is abstol = 0.0_WP.abstol

Restored based on the presence of the argument z as follows: jobz= 'V', if z is present, jobz = 'N', if z is omitted Note that there willbe an error condition if ifail is present and z is omitted.

jobz

Restored based on the presence of arguments vl, vu, il, iu as follows:range = 'V', if one of or both vl and vu are present, range = 'I',if one of or both il and iu are present, range = 'A', if none of vl,

range

vu, il, iu is present, Note that there will be an error condition if oneof or both vl and vu are present and at the same time one of or bothil and iu are present.

Application Notes

For optimum performance use lwork ≥ (nb+3)*n, where nb is the maximum of the blocksizefor ?sytrd and ?ormtr returned by ilaenv.




936



An approximate eigenvalue is accepted as converged when it is determined to lie in an interval

[a,b] of width less than or equal to abstol + ε * max( |a|,|b| ), where ε is the machineprecision.

If abstol is less than or equal to zero, then ε*|T| is used in its place, where|T| is the 1-normof the tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computedmost accurately when abstol is set to twice the underflow threshold 2*slamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, trysetting abstol to 2*slamch('S').

?heevxComputes selected eigenvalues and, optionally,eigenvectors of a Hermitian matrix.

Syntax

Fortran 77:

call cheevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, work, lwork, rwork, iwork, ifail, info)

call zheevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, work, lwork, rwork, iwork, ifail, info)

Fortran 95:

call heevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

Note that for most cases of complex Hermetian eigenvalue problems the default choice shouldbe ?heevr function as its underlying algorithm is faster and uses less workspace. ?heevx isfaster for a few selected eigenvalues.

937


Input Parameters


CHARACTER*1. Must be 'A', 'V', or 'I'.rangeIf range = 'A', all eigenvalues will be found.If range = 'V', all eigenvalues in the half-open interval(vl, vu] will be found.If range = 'I', the eigenvalues with indices il throughiu will be found.


INTEGER. The order of the matrix A (n < 0).n

COMPLEX for cheevxa, workDOUBLE COMPLEX for zheevx.Arrays:a(lda,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

REAL for cheevxvl, vuDOUBLE PRECISION for zheevx.If range = 'V', the lower and upper bounds of the interval

to be searched for eigenvalues; vl ≤ vu. Not referenced ifrange = 'A'or 'I'.

INTEGER.il, iuIf range = 'I', the indices of the smallest and largesteigenvalues to be returned. Constraints:

1 ≤ il ≤ iu ≤ n, if n > 0;il = 1 and iu = 0, if n = 0. Notreferenced if range = 'A'or 'V'.

938

REAL for cheevxabstolDOUBLE PRECISION for zheevx. The absolute error tolerancefor the eigenvalues. See Application Notes for moreinformation.

INTEGER. The first dimension of the output array z; ldz ≥1.

ldz

If jobz = 'V', then ldz ≥max(1, n).


lwork ≥ 1 if n≤1; otherwise at least 2*n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for cheevxrworkDOUBLE PRECISION for zheevx.Workspace array, DIMENSION at least max(1, 7n).


Output Parameters


a

INTEGER. The total number of eigenvalues found;0 ≤ m ≤ n.m

If range = 'A', m = n, and if range = 'I', m = iu-il+1.

REAL for cheevxwDOUBLE PRECISION for zheevxArray, DIMENSION at least max(1, n). The first m elementscontain the selected eigenvalues of the matrix A in ascendingorder.

COMPLEX for cheevxzDOUBLE COMPLEX for zheevx.Array z(ldz,*) contains eigenvectors.The second dimension of z must be at least max(1, m).

939


If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced. Note: you mustensure that at least max(1,m) columns are supplied in thearray z; if range = 'V', the exact value of m is not knownin advance and an upper bound must be used.


work(1)

INTEGER.ifailArray, DIMENSION at least max(1, n).If jobz = 'V', then if info = 0, the first m elements ofifail are zero; if info > 0, then ifail contains the indicesof the eigenvectors that failed to converge.If jobz = 'V', then ifail is not referenced.




Specific details for the routine heevx interface are the following:



Holds the matrix Z of size (n, n).z



940







Restored based on the presence of the argument z as follows: jobz= 'V', if z is present, jobz = 'N', if z is omitted Note that there willbe an error condition if ifail is present and z is omitted.

jobz


range


Application Notes

For optimum performance use lwork ≥ (nb+1)*n, where nb is the maximum of the blocksizefor ?hetrd and ?unmtr returned by ilaenv.






[a,b] of width less than or equal to abstol + ε * max( |a|,|b| ) , where ε is the machineprecision.

941


If abstol is less than or equal to zero, then ε*|T| will be used in its place, where |T| is the1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues willbe computed most accurately when abstol is set to twice the underflow threshold 2*slamch('S'),not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, trysetting abstol to 2*slamch('S').

?syevrComputes selected eigenvalues and, optionally,eigenvectors of a real symmetric matrix using theRelatively Robust Representations.

Syntax

Fortran 77:

call ssyevr(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, isuppz, work, lwork, iwork, liwork, info)

call dsyevr(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, isuppz, work, lwork, iwork, liwork, info)

Fortran 95:

call syevr(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,isuppz] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetricmatrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

The routine first reduces the matrix A to tridiagonal form T with a call to ?sytrd. Then, wheneverpossible, ?syevr calls ?stemr to compute the eigenspectrum using Relatively RobustRepresentations. ?stemr computes eigenvalues by the dqds algorithm, while orthogonaleigenvectors are computed from various “good” L*D*LT representations (also known as RelativelyRobust Representations). Gram-Schmidt orthogonalization is avoided as far as possible. Morespecifically, the various steps of the algorithm are as follows. For the each unreduced block ofT,

942


a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to highrelative accuracy. This means that small relative changes in the entries of D and L causeonly small relative changes in the eigenvalues and eigenvectors. The standard (unfactored)representation of the tridiagonal matrix T does not have this property in general;

b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithmattains full accuracy of the computed eigenvalues only right before the corresponding vectorshave to be computed, see steps c) and d);

c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a newfactorization, and refine the shifted eigenvalues to suitable accuracy;

d. For each eigenvalue with a large enough relative separation compute the correspondingeigenvector by forming a rank revealing twisted factorization. Go back to c) for any clustersthat remain.

The desired accuracy of the output can be specified by the input parameter abstol.

The routine ?syevr calls ?stemr when the full spectrum is requested on machines that conformto the IEEE-754 floating point standard. ?syevr calls ?stebz and ?stein on non-IEEE machinesand when partial spectrum requests are made.

Note that ?syevr is preferable for most cases of real symmetric eigenvalue problems as itsunderlying algorithm is fast and uses less workspace.

Input Parameters



vl< lambda(i) ≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebzand sstein/dstein are called.


943

If uplo = 'U', a stores the upper triangular part of A.If uplo = 'L', a stores the lower triangular part of A.


REAL for ssyevra, workDOUBLE PRECISION for dsyevr.Arrays:a(lda,*) is an array containing either upper or lowertriangular part of the symmetric matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

REAL for ssyevrvl, vuDOUBLE PRECISION for dsyevr.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.

INTEGER.il, iuIf range = 'I', the indices in ascending order of thesmallest and largest eigenvalues to be returned.Constraint:

1 ≤ il ≤ iu ≤ n, if n > 0;il=1 and iu=0, if n = 0.If range = 'A' or 'V', il and iu are not referenced.

REAL for ssyevrabstolDOUBLE PRECISION for dsyevr. The absolute error toleranceto which each eigenvalue/eigenvector is required.If jobz = 'V', the eigenvalues and eigenvectors outputhave residual norms bounded by abstol, and the dotproducts between different eigenvectors are bounded byabstol.

944

If abstol < n *eps*|T|, then n *eps*|T| will be usedin its place, where eps is the machine precision, and|T| is the 1-norm of the matrix T. The eigenvaluesare computed to an accuracy of eps*|T| irrespectiveof abstol.If high relative accuracy is important, set abstol to?lamch('S').

INTEGER. The leading dimension of the output array z.ldzConstraints:

ldz ≥ 1 if jobz = 'N';ldz < max(1, n) if jobz = 'V'.


Constraint: lwork ≥ max(1, 26n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.


INTEGER.liwork

The dimension of the array iwork, lwork ≥ max(1, 10n).If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the iwork array,returns this value as the first entry of the iwork array, andno error message related to liwork is issued by xerbla.

Output Parameters


a

INTEGER. The total number of eigenvalues found, 0 ≤ m ≤n.

m


REAL for ssyevrw, z

945

DOUBLE PRECISION for dsyevr.Arrays:w(*), DIMENSION at least max(1, n), contains the selectedeigenvalues in ascending order, stored in w(1) to w(m);z(ldz, *), the second dimension of z must be at leastmax(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Tcorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced. Note that you mustensure that at least max(1, m) columns are supplied in thearray z ; if range = 'V', the exact value of m is not knownin advance and an upper bound must be used.

INTEGER.isuppzArray, DIMENSION at least 2 *max(1, m).The support of the eigenvectors in z, i.e., the indicesindicating the nonzero elements in z. The i-th eigenvectoris nonzero only in elements isuppz( 2i-1) through isuppz(2i ). Currently is not implemented, nor referenced.


work(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, an internal error has occurred.



Specific details for the routine syevr interface are the following:



946


Holds the matrix Z of size (n, n), where the values n and m aresignificant.

z

Holds the vector of length (2*m), where the values (2*m) are significant.isuppz







Restored based on the presence of the argument z as follows: jobz= 'V', if z is present, jobz = 'N', if z is omitted Note that there willbe an error condition if isuppz is present and z is omitted.

jobz


range


Application Notes

For optimum performance use lwork ≥ (nb+6)*n, where nb is the maximum of the blocksizefor ?sytrd and ?ormtr returned by ilaenv.




947



Normal execution of ?stegr may create NaNs and infinities and hence may abort due to afloating point exception in environments which do not handle NaNs and infinities in the IEEEstandard default manner.

?heevrComputes selected eigenvalues and, optionally,eigenvectors of a Hermitian matrix using theRelatively Robust Representations.

Syntax

Fortran 77:

call cheevr(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, isuppz, work, lwork, rwork, lrwork, iwork, liwork, info)

call zheevr(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z,ldz, isuppz, work, lwork, rwork, lrwork, iwork, liwork, info)

Fortran 95:

call heevr(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,isuppz] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

THe routine first reduces the matrix A to tridiagonal form T with a call to ?HETRD. Then,whenever possible, ?heevr calls ?stegr to compute the eigenspectrum using Relatively RobustRepresentations. ?stegr computes eigenvalues by the dqds algorithm, while orthogonaleigenvectors are computed from various “good” L*D*LT representations (also known as RelativelyRobust Representations). Gram-Schmidt orthogonalization is avoided as far as possible. Morespecifically, the various steps of the algorithm are as follows. For each unreduced block(submatrix) of T,

948


a. Compute T - σ*I = L*D* LT, so that L and D define all the wanted eigenvalues to highrelative accuracy. This means that small relative changes in the entries of D and L causeonly small relative changes in the eigenvalues and eigenvectors. The standard (unfactored)representation of the tridiagonal matrix T does not have this property in general;

b. compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithmattains full accuracy of the computed eigenvalues only right before the corresponding vectorshave to be computed, see steps c) and d);

c. for each cluster of close eigenvalues, select a new shift close to the cluster, find a newfactorization, and refine the shifted eigenvalues to suitable accuracy;

d. For each eigenvalue with a large enough relative separation compute the correspondingeigenvector by forming a rank revealing twisted factorization. Go back to the step c) for anyclusters that remain.


The routine ?heevr calls ?stemr when the full spectrum is requested on machines whichconform to the IEEE-754 floating point standard, or ?stebz and ?stein on non-IEEE machinesand when partial spectrum requests are made.

Note that the routine ?heevr is preferable for most cases of complex Hermitian eigenvalueproblems as its underlying algorithm is fast and uses less workspace.

Input Parameters



lambda(i) in the half-open interval: vl< lambda(i) ≤vu.If range = 'I', the routine computes eigenvalues withindices il to iu.For range = 'V'or 'I', sstebz/dstebz andcstein/zstein are called.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', a stores the upper triangular part of A.

949

If uplo = 'L', a stores the lower triangular part of A.


COMPLEX for cheevra, workDOUBLE COMPLEX for zheevr.Arrays:a(lda,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A, as specified byuplo.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


REAL for cheevrvl, vuDOUBLE PRECISION for zheevr.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu ≤ n, if n > 0; il=1 and iu=0 ifn = 0.If range = 'A' or 'V', il and iu are not referenced.

REAL for cheevrabstolDOUBLE PRECISION for zheevr.The absolute error tolerance to which eacheigenvalue/eigenvector is required.If jobz = 'V', the eigenvalues and eigenvectors outputhave residual norms bounded by abstol, and the dotproducts between different eigenvectors are bounded byabstol.If abstol < n *eps*|T|, then n *eps*|T| will be usedin its place, where eps is the machine precision, and |T| isthe 1-norm of the matrix T. The eigenvalues are computedto an accuracy of eps*|T| irrespective of abstol.

950

If high relative accuracy is important, set abstol to?lamch('S').

INTEGER. The leading dimension of the output array z.Constraints:

ldz

ldz ≥ 1 if jobz = 'N';

ldz ≥ max(1, n) if jobz = 'V'.


Constraint: lwork ≥ max(1, 2n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for cheevrrworkDOUBLE PRECISION for zheevr.Workspace array, DIMENSION max(1, lwork).

INTEGER.lrworkThe dimension of the array rwork;

lwork ≥ max(1, 24n).If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.


INTEGER.liworkThe dimension of the array iwork,

lwork ≥ max(1, 10n).If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.

951


Output Parameters


a



REAL for cheevrwDOUBLE PRECISION for zheevr.Array, DIMENSION at least max(1, n), contains the selectedeigenvalues in ascending order, stored in w(1) to w(m).

COMPLEX for cheevrzDOUBLE COMPLEX for zheevr.Array z(ldz, *); the second dimension of z must be at leastmax(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Tcorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.

INTEGER.isuppzArray, DIMENSION at least 2 *max(1, m).The support of the eigenvectors in z, i.e., the indicesindicating the nonzero elements in z. The i-th eigenvectoris nonzero only in elements isuppz( 2i-1) through isuppz(2i ).


work(1)

On exit, if info = 0, then rwork(1) returns the requiredminimal size of lrwork.

rwork(1)

952



iwork(1)




Specific details for the routine heevr interface are the following:




z

Holds the vector of length (2*n), where the values (2*m) are significant.isuppz







Restored based on the presence of the argument z as follows: jobz= 'V', if z is present, jobz = 'N', if z is omitted Note that there willbe an error condition if isuppz is present and z is omitted.

jobz


range


953


Application Notes

For optimum performance use lwork ≥ (nb+1)*n, where nb is the maximum of the blocksizefor ?hetrd and ?unmtr returned by ilaenv.

If you are in doubt how much workspace to supply, use a generous value of lwork (or lrwork,or liwork) for the first run or set lwork = -1 (lrwork = -1, liwork = -1).

If you choose the first option and set any of admissible lwork (or lrwork, liwork) sizes, whichis no less than the minimal value described, the routine completes the task, though probablynot so fast as with a recommended workspace, and provides the recommended workspace inthe first element of the corresponding array (work, rwork, iwork) on exit. Use this value(work(1), rwork(1), iwork(1)) for subsequent runs.

If you set lwork = -1, the routine returns immediately and provides the recommendedworkspace in the first element of the corresponding array (work, rwork, iwork). This operationis called a workspace query.

Note that if you set lwork (lrwork, liwork) to less than the minimal required value and not-1, the routine returns immediately with an error exit and does not provide any information onthe recommended workspace.

Normal execution of ?stegr may create NaNs and infinities and hence may abort due to afloating point exception in environments which do not handle NaNs and infinities in the IEEEstandard default manner.

?spevComputes all eigenvalues and, optionally,eigenvectors of a real symmetric matrix in packedstorage.

Syntax

Fortran 77:

call sspev(jobz, uplo, n, ap, w, z, ldz, work, info)

call dspev(jobz, uplo, n, ap, w, z, ldz, work, info)

Fortran 95:

call spev(a, w [,uplo] [,z] [,info])

954


Description

This routine computes all the eigenvalues and, optionally, eigenvectors of a real symmetricmatrix A in packed storage.

Input Parameters


CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangular partof A.If uplo = 'L', ap stores the packed lower triangular partof A.


REAL for sspevap, workDOUBLE PRECISION for dspevArrays:ap(*) contains the packed upper or lower triangle ofsymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).

(*) is a workspace array, DIMENSION at least max(1, 3n).work


ldz

if jobz = 'N', then ldz ≥ 1;

if jobz = 'V', then ldz ≥ max(1, n).

Output Parameters

REAL for sspevw, zDOUBLE PRECISION for dspevArrays:w(*), DIMENSION at least max(1, n).If info = 0, w contains the eigenvalues of the matrix A inascending order.

955


z(ldz,*). The second dimension of z must be at leastmax(1, n).If jobz = 'V', then if info = 0, z contains theorthonormal eigenvectors of the matrix A, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.

On exit, this array is overwritten by the values generatedduring the reduction to tridiagonal form. The elements ofthe diagonal and the off-diagonal of the tridiagonal matrixoverwrite the corresponding elements of A.

ap




Specific details for the routine spev interface are the following:


a




Restored based on the presence of the argument z as follows: jobz= 'V', if z is present, jobz = 'N', if z is omitted.

jobz

956


?hpevComputes all eigenvalues and, optionally,eigenvectors of a Hermitian matrix in packedstorage.

Syntax

Fortran 77:

call chpev(jobz, uplo, n, ap, w, z, ldz, work, rwork, info)

call zhpev(jobz, uplo, n, ap, w, z, ldz, work, rwork, info)

Fortran 95:

call hpev(a, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A in packed storage.

Input Parameters




COMPLEX for chpevap, workDOUBLE COMPLEX for zhpev.Arrays:ap(*) contains the packed upper or lower triangle ofHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).

957


(*) is a workspace array, DIMENSION at least max(1, 2n-1).work



if jobz = 'V', then ldz ≥ max(1, n) .

REAL for chpevrworkDOUBLE PRECISION for zhpev.Workspace array, DIMENSION at least max(1, 3n-2).

Output Parameters

REAL for chpevwDOUBLE PRECISION for zhpev.Array, DIMENSION at least max(1, n).If info = 0, w contains the eigenvalues of the matrix A inascending order.

COMPLEX for chpevzDOUBLE COMPLEX for zhpev.Array z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains theorthonormal eigenvectors of the matrix A, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.


ap

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, then the algorithm failed to converge; iindicates the number of elements of an intermediatetridiagonal form which did not converge to zero.

958




Specific details for the routine hpev interface are the following:


a





?spevdUses divide and conquer algorithm to compute alleigenvalues and (optionally) all eigenvectors of areal symmetric matrix held in packed storage.

Syntax

Fortran 77:

call sspevd(jobz, uplo, n, ap, w, z, ldz, work, lwork, iwork, liwork, info)

call dspevd(jobz, uplo, n, ap, w, z, ldz, work, lwork, iwork, liwork, info)

Fortran 95:

call spevd(a, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetricmatrix A (held in packed storage). In other words, it can compute the spectral factorization ofA as:

A = ZΛZT.

959



Azi = λizi for i = 1, 2, ..., n.


Input Parameters




REAL for sspevdap, workDOUBLE PRECISION for dspevdArrays:ap(*) contains the packed upper or lower triangle ofsymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2)work is a workspace array, its dimension max(1, lwork).






if jobz = 'N' and n > 1, then lwork ≥ 2*n;

960


if jobz = 'V' and n > 1, then

lwork ≥ n2+ 6*n + 1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.





if jobz = 'V' and n > 1, then liwork ≥ 5*n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters

REAL for sspevdw, zDOUBLE PRECISION for dspevdArrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.z(ldz,*).The second dimension of z must be: at least 1 if jobz ='N';at least max(1, n) if jobz = 'V'.If jobz = 'V', then this array is overwritten by theorthogonal matrix Z which contains the eigenvectors of A.If jobz = 'N', then z is not referenced.

961



ap

On exit, if info = 0, then work(1) returns the requiredlwork.

work(1)

On exit, if info = 0, then iwork(1) returns the requiredliwork.

iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = i, then the algorithm failed to converge; iindicates the number of elements of an intermediatetridiagonal form which did not converge to zero.If info = -i, the i-th parameter had an illegal value.



Specific details for the routine spevd interface are the following:


a





Application Notes


O(ε)||T||2, where εis the machine precision.


962





The complex analogue of this routine is ?hpevd.

See also ?syevd for matrices held in full storage, and ?sbevd for banded matrices.

?hpevdUses divide and conquer algorithm to compute alleigenvalues and (optionally) all eigenvectors of acomplex Hermitian matrix held in packed storage.

Syntax

Fortran 77:

call chpevd(jobz, uplo, n, ap, w, z, ldz, work, lwork, rwork, lrwork, iwork,liwork, info)

call zhpevd(jobz, uplo, n, ap, w, z, ldz, work, lwork, rwork, lrwork, iwork,liwork, info)

Fortran 95:

call hpevd(a, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a complexHermitian matrix A (held in packed storage). In other words, it can compute the spectral

factorization of A as: A = ZΛZH.

963



Azi = λizi for i = 1, 2, ..., n.


Input Parameters




COMPLEX for chpevdap, workDOUBLE COMPLEX for zhpevdArrays:ap(*) contains the packed upper or lower triangle ofHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).work is a workspace array, its dimension max(1, lwork).






if jobz = 'N' and n > 1, then lwork ≥ n;

964


if jobz = 'V' and n > 1, then lwork ≥ 2*n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for chpevdrworkDOUBLE PRECISION for zhpevdWorkspace array, its dimension max(1, lrwork).

INTEGER.lrworkThe dimension of the array rwork. Constraints:


if jobz = 'N' and n > 1, then lrwork ≥ n;

if jobz = 'V' and n > 1, then lrwork ≥ 2*n2 + 5*n +1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.






965


Output Parameters

REAL for chpevdwDOUBLE PRECISION for zhpevdArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.

COMPLEX for chpevdzDOUBLE COMPLEX for zhpevdArray, DIMENSION (ldz,*).The second dimension of z must be:at least 1 if jobz = 'N';at least max(1, n) if jobz = 'V'.If jobz = 'V', then this array is overwritten by the unitarymatrix Z which contains the eigenvectors of A.If jobz = 'N', then z is not referenced.


ap


work(1)


rwork(1)


iwork(1)




966


Specific details for the routine hpevd interface are the following:


a





Application Notes


O(ε)||T||2, where ε is the machine precision.





The real analogue of this routine is ?spevd.

See also ?heevd for matrices held in full storage, and ?hbevd for banded matrices.

967


?spevxComputes selected eigenvalues and, optionally,eigenvectors of a real symmetric matrix in packedstorage.

Syntax

Fortran 77:

call sspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz,work, iwork, ifail, info)

call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz,work, iwork, ifail, info)

Fortran 95:

call spevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetricmatrix A in packed storage. Eigenvalues and eigenvectors can be selected by specifying eithera range of values or a range of indices for the desired eigenvalues.

Input Parameters



lambda(i) in the half-open interval: vl< lambda(i)≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ap stores the packed upper triangular partof A.

968

If uplo = 'L', ap stores the packed lower triangular partof A.


REAL for sspevxap, workDOUBLE PRECISION for dspevxArrays:ap(*) contains the packed upper or lower triangle of thesymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,8n).

REAL for sspevxvl, vuDOUBLE PRECISION for dspevxIf range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu ≤ n, if n > 0; il=1 and iu=0if n = 0.If range = 'A' or 'V', il and iu are not referenced.

REAL for sspevxabstolDOUBLE PRECISION for dspevxThe absolute error tolerance to which each eigenvalue isrequired. See Application notes for details on error tolerance.





969

Output Parameters


ap


0 ≤ m ≤ n. If range = 'A', m = n, and if range = 'I',m = iu-il+1.

REAL for sspevxw, zDOUBLE PRECISION for dspevxArrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the selected eigenvalues of the matrixA in ascending order.z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.

INTEGER.ifailArray, DIMENSION at least max(1, n).If jobz = 'V', then if info = 0, the first m elements ofifail are zero; if info > 0, the ifail contains the indicesthe eigenvectors that failed to converge.If jobz = 'N', then ifail is not referenced.


970


If info = -i, the i-th parameter had an illegal value.If info = i, then i eigenvectors failed to converge; theirindices are stored in the array ifail.



Specific details for the routine spevx interface are the following:


a



z








Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omittedNote that there will be an error condition if ifail is present and z isomitted.


971


Application Notes



If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is thetridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computedmost accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, trysetting abstol to 2*?lamch('S').

?hpevxComputes selected eigenvalues and, optionally,eigenvectors of a Hermitian matrix in packedstorage.

Syntax

Fortran 77:

call chpevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz,work, rwork, iwork, ifail, info)

call zhpevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz,work, rwork, iwork, ifail, info)

Fortran 95:

call hpevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A in packed storage. Eigenvalues and eigenvectors can be selected by specifying eithera range of values or a range of indices for the desired eigenvalues.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf job = 'N', then only eigenvalues are computed.

972


If job = 'V', then eigenvalues and eigenvectors arecomputed.


lambda(i) in the half-open interval: vl<lambda(i) ≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.



COMPLEX for chpevxap, workDOUBLE COMPLEX for zhpevxArrays:ap(*) contains the packed upper or lower triangle of theHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,2n).

REAL for chpevxvl, vuDOUBLE PRECISION for zhpevxIf range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for chpevxabstolDOUBLE PRECISION for zhpevx

973

The absolute error tolerance to which each eigenvalue isrequired. See Application notes for details on error tolerance.




REAL for chpevxrworkDOUBLE PRECISION for zhpevxWorkspace array, DIMENSION at least max(1, 7n).


Output Parameters


ap


m


REAL for chpevxwDOUBLE PRECISION for zhpevxArray, DIMENSION at least max(1, n).If info = 0, contains the selected eigenvalues of the matrixA in ascending order.

COMPLEX for chpevxzDOUBLE COMPLEX for zhpevxArray z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.

974


If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, then i eigenvectors failed to converge; theirindices are stored in the array ifail.



Specific details for the routine hpevx interface are the following:


a



z








975




Application Notes





?sbevComputes all eigenvalues and, optionally,eigenvectors of a real symmetric band matrix.

Syntax

Fortran 77:

call ssbev(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, info)

call dsbev(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, info)

Fortran 95:

call sbev(a, w [,uplo] [,z] [,info])

976


Description

This routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric bandmatrix A.

Input Parameters





(kd ≥ 0).

REAL for ssbevab, workDOUBLE PRECISION for dsbev.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).work (*) is a workspace array.The dimension of work must be at least max(1, 3n-2).

INTEGER. The leading dimension of ab; must be at least kd+1.

ldab




Output Parameters

REAL for ssbevw, zDOUBLE PRECISION for dsbev

977


Arrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order.z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains theorthonormal eigenvectors of the matrix A, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.

On exit, this array is overwritten by the values generatedduring the reduction to tridiagonal form.

ab

If uplo = 'U', the first superdiagonal and the diagonal ofthe tridiagonal matrix T are returned in rows kd and kd+1of ab, and if uplo = 'L', the diagonal and first subdiagonalof T are returned in the first two rows of ab.




Specific details for the routine sbev interface are the following:


a





978


?hbevComputes all eigenvalues and, optionally,eigenvectors of a Hermitian band matrix.

Syntax

Fortran 77:

call chbev(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, rwork, info)

call zhbev(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, rwork, info)

Fortran 95:

call hbev(a, w [,uplo] [,z] [,info])

Description

This routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitianband matrix A.

Input Parameters





(kd ≥ 0).

COMPLEX for chbevab, workDOUBLE COMPLEX for zhbev.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).

979


work (*) is a workspace array.The dimension of work must be at least max(1, n).


ldab




REAL for chbevrworkDOUBLE PRECISION for zhbevWorkspace array, DIMENSION at least max(1, 3n-2).

Output Parameters

REAL for chbevwDOUBLE PRECISION for zhbevArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chbevzDOUBLE COMPLEX for zhbev.Array z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains theorthonormal eigenvectors of the matrix A, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.


ab


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, then the algorithm failed to converge;

980


i indicates the number of elements of an intermediatetridiagonal form which did not converge to zero.



Specific details for the routine hbev interface are the following:


a





?sbevdComputes all eigenvalues and (optionally) alleigenvectors of a real symmetric band matrix usingdivide and conquer algorithm.

Syntax

Fortran 77:

call ssbevd(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, lwork, iwork, liwork,info)

call dsbevd(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, lwork, iwork, liwork,info)

Fortran 95:

call sbevd(a, w [,uplo] [,z] [,info])

981


Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetricband matrix A. In other words, it can compute the spectral factorization of A as:

A = ZΛZT


Azi = λizi for i = 1, 2, ..., n.


Input Parameters





(kd ≥ 0).

REAL for ssbevdab, workDOUBLE PRECISION for dsbevd.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).

INTEGER. The leading dimension of ab; must be at leastkd+1.

ldab

982


if jobz = 'N' and n > 1, then lwork ≥ 2n;

if jobz = 'V' and n > 1, then lwork ≥ 2*n2 + 5*n +1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.


INTEGER.liwork

The dimension of the array iwork. Constraints: if n ≤ 1,then liwork < 1; if job = 'N' and n > 1, then liwork< 1; if job = 'V' and n > 1, then liwork < 5*n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters

REAL for ssbevdw, zDOUBLE PRECISION for dsbevdArrays:w(*), DIMENSION at least max(1, n).

983

If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.z(ldz,*).The second dimension of z must be:at least 1 if job = 'N';at least max(1, n) if job = 'V'.If job = 'V', then this array is overwritten by theorthogonal matrix Z which contains the eigenvectors of A.The i-th column of Z contains the eigenvector whichcorresponds to the eigenvalue w(i).If job = 'N', then z is not referenced.


ab


work(1)


iwork(1)




Specific details for the routine sbevd interface are the following:


a




Restored based on the presence of the argument z as follows:jobz

984


jobz = 'V', if z is present,jobz = 'N', if z is omitted.

Application Notes







The complex analogue of this routine is ?hbevd.

See also ?syevd for matrices held in full storage, and ?spevd for matrices held in packedstorage.

985


?hbevdComputes all eigenvalues and (optionally) alleigenvectors of a complex Hermitian band matrixusing divide and conquer algorithm.

Syntax

Fortran 77:

call chbevd(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, lwork, rwork, lrwork,iwork, liwork, info)

call zhbevd(jobz, uplo, n, kd, ab, ldab, w, z, ldz, work, lwork, rwork, lrwork,iwork, liwork, info)

Fortran 95:

call hbevd(a, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a complexHermitian band matrix A. In other words, it can compute the spectral factorization of A as: A

= ZΛZH.


Azi = λizi for i = 1, 2, ..., n.


Input Parameters


CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', ab stores the upper triangular part of A.

986


If uplo = 'L', ab stores the lower triangular part of A.



(kd ≥ 0).

COMPLEX for chbevdab, workDOUBLE COMPLEX for zhbevd.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).work (*) is a workspace array , its dimension max(1,lwork).

INTEGER. The leading dimension of ab; must be at leastkd+1.

ldab






if jobz = 'N' and n > 1, then lwork ≥ n;

if jobz = 'V' and n > 1, then lwork ≥ 2*n2.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for chbevdrworkDOUBLE PRECISION for zhbevdWorkspace array, DIMENSION at least lrwork.

987


INTEGER.lrworkThe dimension of the array rwork.Constraints:


if jobz = 'N' and n > 1, then lrwork ≥ n;

if jobz = 'V' and n > 1, then lrwork ≥ 2*n2 + 5*n +1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

INTEGER. Workspace array, DIMENSION max(1, liwork).iwork


if jobz = 'N' or n ≤ 1, then liwork ≥ 1;


Output Parameters

REAL for chbevdwDOUBLE PRECISION for zhbevdArray, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues of the matrix A inascending order. See also info.

COMPLEX for chbevdzDOUBLE COMPLEX for zhbevdArray, DIMENSION (ldz,*).

988


The second dimension of z must be:at least 1 if jobz = 'N';at least max(1, n) if jobz = 'V'.If jobz = 'V', then this array is overwritten by the unitarymatrix Z which contains the eigenvectors of A. The i-thcolumn of Z contains the eigenvector which corresponds tothe eigenvalue w(i).If jobz = 'N', then z is not referenced.


ab

On exit, if lwork > 0, then the real part of work(1) returnsthe required minimal size of lwork.

work(1)

On exit, if lrwork > 0, then rwork(1) returns the requiredminimal size of lrwork.

rwork(1)


iwork(1)




Specific details for the routine hbevd interface are the following:


a




Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,

989


jobz = 'N', if z is omitted.

Application Notes







The real analogue of this routine is ?sbevd.

See also ?heevd for matrices held in full storage, and ?hpevd for matrices held in packedstorage.

?sbevxComputes selected eigenvalues and, optionally,eigenvectors of a real symmetric band matrix.

Syntax

Fortran 77:

call ssbevx(jobz, range, uplo, n, kd, ab, ldab, q, ldq, vl, vu, il, iu, abstol,m, w, z, ldz, work, iwork, ifail, info)

call dsbevx(jobz, range, uplo, n, kd, ab, ldab, q, ldq, vl, vu, il, iu, abstol,m, w, z, ldz, work, iwork, ifail, info)

990


Fortran 95:

call sbevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,q][,abstol] [,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetricband matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range ofvalues or a range of indices for the desired eigenvalues.

Input Parameters


CHARACTER*1. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues λi in

the half-open interval: vl<λi ≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.




(kd ≥ 0).

REAL for ssbevxab, workDOUBLE PRECISION for dsbevx.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).work (*) is a workspace array.The dimension of work must be at least max(1, 7n).

991

ldab

REAL for ssbevxvl, vuDOUBLE PRECISION for dsbevx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for chpevxabstolDOUBLE PRECISION for zhpevxThe absolute error tolerance to which each eigenvalue isrequired. See Application notes for details on error tolerance.

INTEGER. The leading dimensions of the output arrays qand z, respectively.

ldq, ldz

Constraints:

ldq ≥ 1, ldz ≥ 1;

If jobz = 'V', then ldq ≥ max(1, n) and ldz ≥ max(1,n).


Output Parameters

REAL for ssbevx DOUBLE PRECISION for dsbevx.qArray, DIMENSION (ldz,n).If jobz = 'V', the n-by-n orthogonal matrix is used in thereduction to tridiagonal form.If jobz = 'N', the array q is not referenced.


m


992

REAL for ssbevxw, zDOUBLE PRECISION for dsbevxArrays:w(*), DIMENSION at least max(1, n). The first m elementsof w contain the selected eigenvalues of the matrix A inascending order.z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


ab



INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, then i eigenvectors failed to converge; theirindices are stored in the array ifail.

993




Specific details for the routine sbevx interface are the following:


a



z


Holds the matrix Q of size (n, n).q







Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omittedNote that there will be an error condition if either ifail or q is presentand z is omitted.


994


Application Notes





?hbevxComputes selected eigenvalues and, optionally,eigenvectors of a Hermitian band matrix.

Syntax

Fortran 77:

call chbevx(jobz, range, uplo, n, kd, ab, ldab, q, ldq, vl, vu, il, iu, abstol,m, w, z, ldz, work, rwork, iwork, ifail, info)

call zhbevx(jobz, range, uplo, n, kd, ab, ldab, q, ldq, vl, vu, il, iu, abstol,m, w, z, ldz, work, rwork, iwork, ifail, info)

Fortran 95:

call hbevx(a, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,q][,abstol] [,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitianband matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range ofvalues or a range of indices for the desired eigenvalues.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobzIf job = 'N', then only eigenvalues are computed.

995


If job = 'V', then eigenvalues and eigenvectors arecomputed.


lambda(i) in the half-open interval: vl< lambda(i) ≤vu.If range = 'I', the routine computes eigenvalues withindices il to iu.




(kd ≥ 0).

COMPLEX for chbevxab, workDOUBLE COMPLEX for zhbevx.Arrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of ab must be at least max(1, n).work (*) is a workspace array.The dimension of work must be at least max(1, n).


ldab

REAL for chbevxvl, vuDOUBLE PRECISION for zhbevx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


996


REAL for chbevxabstolDOUBLE PRECISION for zhbevx.The absolute error tolerance to which each eigenvalue isrequired. See Application notes for details on error tolerance.

INTEGER. The leading dimensions of the output arrays qand z, respectively.

ldq, ldz

Constraints:

ldq ≥ 1, ldz ≥ 1;

If jobz = 'V', then ldq ≥ max(1, n) and ldz ≥ max(1,n).

REAL for chbevxrworkDOUBLE PRECISION for zhbevxWorkspace array, DIMENSION at least max(1, 7n).


Output Parameters

COMPLEX for chbevx DOUBLE COMPLEX for zhbevx.qArray, DIMENSION (ldz,n).If jobz = 'V', the n-by-n unitary matrix is used in thereduction to tridiagonal form.If jobz = 'N', the array q is not referenced.



REAL for chbevxwDOUBLE PRECISION for zhbevxArray, DIMENSION at least max(1, n). The first m elementscontain the selected eigenvalues of the matrix A in ascendingorder.

COMPLEX for chbevxzDOUBLE COMPLEX for zhbevx.Array z(ldz,*).

997


The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


ab


INTEGER.ifailArray, DIMENSION at least max(1, n).If jobz = 'V', then if info = 0, the first m elements ofifail are zero; if info > 0, the ifail contains the indicesof the eigenvectors that failed to converge.If jobz = 'N', then ifail is not referenced.




Specific details for the routine hbevx interface are the following:

998



a



z









Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omittedNote that there will be an error condition if either ifail or q is presentand z is omitted.


Application Notes




999



?stevComputes all eigenvalues and, optionally,eigenvectors of a real symmetric tridiagonal matrix.

Syntax

Fortran 77:

call sstev(jobz, n, d, e, z, ldz, work, info)

call dstev(jobz, n, d, e, z, ldz, work, info)

Fortran 95:

call stev(d, e [,z] [,info])

Description

This routine computes all eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix A.

Input Parameters



REAL for sstevd, e, workDOUBLE PRECISION for dstev.Arrays:d(*) contains the n diagonal elements of the tridiagonalmatrix A.The dimension of d must be at least max(1, n).e(*) contains the n-1 subdiagonal elements of the tridiagonalmatrix A.The dimension of e must be at least max(1, n-1). The n-thelement of this array is used as workspace.

1000


work(*) is a workspace array.The dimension of work must be at least max(1, 2n-2).If jobz = 'N', work is not referenced.

INTEGER. The leading dimension of the output array z; ldz

≥ 1. If jobz = 'V' then ldz ≥ max(1, n).

ldz

Output Parameters

On exit, if info = 0, contains the eigenvalues of the matrixA in ascending order.

d

REAL for sstevzDOUBLE PRECISION for dstevArray, DIMENSION (ldz, *).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains theorthonormal eigenvectors of the matrix A, with the i-thcolumn of z holding the eigenvector associated with theeigenvalue returned in d(i).If job = 'N', then z is not referenced.

On exit, this array is overwritten with intermediate results.e

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then the algorithm failed to converge;i elements of e did not converge to zero.



Specific details for the routine stev interface are the following:





1001



?stevdComputes all eigenvalues and (optionally) alleigenvectors of a real symmetric tridiagonal matrixusing divide and conquer algorithm.

Syntax

Fortran 77:

call sstevd(jobz, n, d, e, z, ldz, work, lwork, iwork, liwork, info)

call dstevd(jobz, n, d, e, z, ldz, work, lwork, iwork, liwork, info)

Fortran 95:

call stevd(d, e [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetrictridiagonal matrix T. In other words, the routine can compute the spectral factorization of T

as: T = ZΛZT.


Tzi = λizi for i = 1, 2, ..., n.


There is no complex analogue of this routine.

Input Parameters


1002


REAL for sstevdd, e, workDOUBLE PRECISION for dstevd.Arrays:d(*) contains the n diagonal elements of the tridiagonalmatrix T.The dimension of d must be at least max(1, n).e(*) contains the n-1 off-diagonal elements of T.The dimension of e must be at least max(1, n-1). The n-thelement of this array is used as workspace.work(*) is a workspace array.The dimension of work must be at least lwork.


ldz

ldz ≥ 1 if job = 'N';ldz < max(1, n) if job = 'V'.


if jobz = 'N' or n ≤ 1, then lwork ≥ 1;

if jobz = 'V' and n > 1, then lwork ≥ n2 + 4*n + 1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.



if jobz = 'N' or n ≤ 1, then liwork ≥ 1;

if jobz = 'V' and n > 1, then liwork ≥ 5*n+3.

1003

If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters

On exit, if info = 0, contains the eigenvalues of the matrixT in ascending order.

d

See also info.

REAL for sstevdzDOUBLE PRECISION for dstevdArray, DIMENSION (ldz, *).The second dimension of z must be:at least 1 if jobz = 'N';at least max(1, n) if jobz = 'V'.If jobz = 'V', then this array is overwritten by theorthogonal matrix Z which contains the eigenvectors of T.If jobz = 'N', then z is not referenced.

On exit, this array is overwritten with intermediate results.e


work(1)


iwork(1)




1004


Specific details for the routine stevd interface are the following:





Application Notes




|μi - λi| ≤ c(n)ε ||T||2



the angle θ(zi, wi) between them is bounded as follows:

θ(zi, wi) ≤ c(n)ε ||T||2 / min i≠j|λi - λj|.

Thus the accuracy of a computed eigenvector depends on the gap between its eigenvalue andall the other eigenvalues.




1005


?stevxComputes selected eigenvalues and eigenvectorsof a real symmetric tridiagonal matrix.

Syntax

Fortran 77:

call sstevx(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, work,iwork, ifail, info)

call dstevx(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, work,iwork, ifail, info)

Fortran 95:

call stevx(d, e, w [, z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix A. Eigenvalues and eigenvectors can be selected by specifying either a rangeof values or a range of indices for the desired eigenvalues.

Input Parameters



lambda(i) in the half-open interval: vl<lambda(i)≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.

1006

REAL for sstevxd, e, workDOUBLE PRECISION for dstevx.Arrays:d(*) contains the n diagonal elements of the tridiagonalmatrix A.The dimension of d must be at least max(1, n).e(*) contains the n-1 subdiagonal elements of A.The dimension of e must be at least max(1, n-1). The n-thelement of this array is used as workspace.work(*) is a workspace array.The dimension of work must be at least max(1, 5n).

REAL for sstevxvl, vuDOUBLE PRECISION for dstevx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for sstevxabstolDOUBLE PRECISION for dstevx. The absolute error toleranceto which each eigenvalue is required. See Application notesfor details on error tolerance.

INTEGER. The leading dimensions of the output array z;

ldz ≥ 1. If jobz = 'V', then ldz ≥ max(1, n).

ldz


Output Parameters


0 ≤ m ≤ n.

1007


REAL for sstevxw, zDOUBLE PRECISION for dstevx.Arrays:w(*), DIMENSION at least max(1, n).The first m elements of w contain the selected eigenvaluesof the matrix A in ascending order.

z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.

On exit, these arrays may be multiplied by a constant factorchosen to avoid overflow or underflow in computing theeigenvalues.

d, e



1008




Specific details for the routine stevx interface are the following:





z









Application Notes



1009


If abstol is less than or equal to zero, then ε*||A||1 will be used in its place. Eigenvalues willbe computed most accurately when abstol is set to twice the underflow threshold 2*?lamch('S'),not zero.


?stevrComputes selected eigenvalues and, optionally,eigenvectors of a real symmetric tridiagonal matrixusing the Relatively Robust Representations.

Syntax

Fortran 77:

call sstevr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

call dstevr(jobz, range, n, d, e, vl, vu, il, iu, abstol, m, w, z, ldz, isuppz,work, lwork, iwork, liwork, info)

Fortran 95:

call stevr(d, e, w [, z] [,vl] [,vu] [,il] [,iu] [,m] [,isuppz] [,abstol][,info])

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetrictridiagonal matrix T. Eigenvalues and eigenvectors can be selected by specifying either a rangeof values or a range of indices for the desired eigenvalues.

Whenever possible, the routine calls ?stemr to compute the eigenspectrum using RelativelyRobust Representations. ?stegr computes eigenvalues by the dqds algorithm, while orthogonaleigenvectors are computed from various “good” L*D*LT representations (also known as RelativelyRobust Representations). Gram-Schmidt orthogonalization is avoided as far as possible. Morespecifically, the various steps of the algorithm are as follows. For the i-th unreduced block ofT,

a. Compute T - σi = Li*Di*LiT, such that Li *Di *Li

T is a relatively robust representation;

b. Compute the eigenvalues, λj, of Li*Di*LiT to high relative accuracy by the dqds algorithm;

1010


c. If there is a cluster of close eigenvalues, “choose” σi close to the cluster, and go to step(a);

d. Given the approximate eigenvalue λj of Li Di LiT, compute the corresponding eigenvector

by forming a rank-revealing twisted factorization.


The routine ?stevr calls ?stemr when the full spectrum is requested on machines whichconform to the IEEE-754 floating point standard. ?stevr calls ?stebz and ?stein on non-IEEEmachines and when partial spectrum requests are made.

Input Parameters



vl<lambda(i) ≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebzand sstein/dstein are called.


REAL for sstevrd, e, workDOUBLE PRECISION for dstevr.Arrays:d(*) contains the n diagonal elements of the tridiagonalmatrix T.The dimension of d must be at least max(1, n).e(*) contains the n-1 subdiagonal elements of A.The dimension of e must be at least max(1, n-1). The n-thelement of this array is used as workspace.work is a workspace array, its dimension max(1,lwork).

1011

REAL for sstevrvl, vuDOUBLE PRECISION for dstevr.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu≤ n, if n > 0; il=1 and iu=0 ifn = 0.If range = 'A' or 'V', il and iu are not referenced.

REAL for ssyevrabstolDOUBLE PRECISION for dsyevr.The absolute error tolerance to which eacheigenvalue/eigenvector is required.If jobz = 'V', the eigenvalues and eigenvectors outputhave residual norms bounded by abstol, and the dotproducts between different eigenvectors are bounded byabstol. If abstol < n *eps*|T|, then n *eps*|T| willbe used in its place, where eps is the machine precision,and |T| is the 1-norm of the matrix T. The eigenvalues arecomputed to an accuracy of eps*|T| irrespective of abstol.If high relative accuracy is important, set abstol to?lamch('S').


ldz ≥ 1 if jobz = 'N';

ldz ≥ max(1, n) if jobz = 'V'.

INTEGER.lworkThe dimension of the array work. Constraint:

lwork ≥ max(1, 20*n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of the

1012

work and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.


INTEGER.liworkThe dimension of the array iwork,

lwork ≥ max(1, 10*n).If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters



REAL for sstevrw, zDOUBLE PRECISION for dstevr.Arrays:w(*), DIMENSION at least max(1, n).The first m elements of w contain the selected eigenvaluesof the matrix T in ascending order.z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Tcorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).If jobz = 'N', then z is not referenced.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.

1013


On exit, these arrays may be multiplied by a constant factorchosen to avoid overflow or underflow in computing theeigenvalues.

d, e

INTEGER.isuppzArray, DIMENSION at least 2 *max(1, m).The support of the eigenvectors in z, i.e., the indicesindicating the nonzero elements in z. The i-th eigenvectoris nonzero only in elements isuppz( 2i-1) through isuppz(2i ).Implemented only for range = 'A' or 'I' and iu-il =n-1.


work(1)


iwork(1)




Specific details for the routine stevr interface are the following:





z

Holds the vector of length (2*n), where the values (2*m) are significant.isuppz




1014






Application Notes

Normal execution of the routine ?stegr may create NaNs and infinities and hence may abortdue to a floating point exception in environments which do not handle NaNs and infinities inthe IEEE standard default manner.





Nonsymmetric Eigenproblems

This section describes LAPACK driver routines used for solving nonsymmetric eigenproblems.See also computational routines that can be called to solve these problems.

1015


Table 4-11 lists all such driver routines for Fortran-77 interface. Respective routine names inFortran-95 interface are without the first symbol (see Routine Naming Conventions).

Table 4-11 Driver Routines for Solving Nonsymmetric Eigenproblems


Computes the eigenvalues and Schur factorization of a general matrix, andorders the factorization so that selected eigenvalues are at the top left ofthe Schur form.

?gees

Computes the eigenvalues and Schur factorization of a general matrix,orders the factorization and computes reciprocal condition numbers.

?geesx

Computes the eigenvalues and left and right eigenvectors of a generalmatrix.

?geev

Computes the eigenvalues and left and right eigenvectors of a generalmatrix, with preliminary matrix balancing, and computes reciprocal conditionnumbers for the eigenvalues and right eigenvectors.

?geevx

?geesComputes the eigenvalues and Schur factorizationof a general matrix, and orders the factorizationso that selected eigenvalues are at the top left ofthe Schur form.

Syntax

Fortran 77:

call sgees(jobvs, sort, select, n, a, lda, sdim, wr, wi, vs, ldvs, work,lwork, bwork, info)

call dgees(jobvs, sort, select, n, a, lda, sdim, wr, wi, vs, ldvs, work,lwork, bwork, info)

call cgees(jobvs, sort, select, n, a, lda, sdim, w, vs, ldvs, work, lwork,rwork, bwork, info)

call zgees(jobvs, sort, select, n, a, lda, sdim, w, vs, ldvs, work, lwork,rwork, bwork, info)

1016


Fortran 95:

call gees(a, wr, wi [,vs] [,select] [,sdim] [,info])

call gees(a, w [,vs] [,select] [,sdim] [,info])

Description

This routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, thereal Schur form T, and, optionally, the matrix of Schur vectors Z. This gives the Schurfactorization A = Z*T*ZH.

Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so thatselected eigenvalues are at the top left. The leading columns of Z then form an orthonormalbasis for the invariant subspace corresponding to the selected eigenvalues.

A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks.2-by-2 blocks will be standardized in the form

where b*c < 0. The eigenvalues of such a block are

A complex matrix is in Schur form if it is upper triangular.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvsIf jobvs = 'N', then Schur vectors are not computed.If jobvs = 'V', then Schur vectors are computed.

CHARACTER*1. Must be 'N' or 'S'. Specifies whether ornot to order the eigenvalues on the diagonal of the Schurform.

sort

If sort = 'N', then eigenvalues are not ordered.If sort = 'S', eigenvalues are ordered (see select).

LOGICAL FUNCTION of two REAL arguments for real flavors.select

1017

LOGICAL FUNCTION of one COMPLEX argument for complexflavors.select must be declared EXTERNAL in the calling subroutine.If sort = 'S', select is used to select eigenvalues to sortto the top left of the Schur form.If sort = 'N', select is not referenced.For real flavors:An eigenvalue wr(j)+sqrt(-1)*wi(j) is selected ifselect(wr(j), wi(j)) is true; that is, if either one of acomplex conjugate pair of eigenvalues is selected, then bothcomplex eigenvalues are selected.Note that a selected complex eigenvalue may no longersatisfy select(wr(j), wi(j))= .TRUE. after ordering, sinceordering may change the value of complex eigenvalues(especially if the eigenvalue is ill-conditioned); in this caseinfo may be set to n+2 (see info below).For complex flavors:An eigenvalue w(j) is selected if select(w(j)) is true.


REAL for sgeesa, workDOUBLE PRECISION for dgeesCOMPLEX for cgeesDOUBLE COMPLEX for zgees.Arrays:a(lda,*) is an array containing the n-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).


lda

INTEGER. The leading dimension of the output array vs.Constraints:

ldvs

ldvs ≥ 1;

ldvs ≥ max(1, n) if jobvs = 'V'.


1018


Constraint:

lwork ≥ max(1, 3n) for real flavors;

lwork ≥ max(1, 2n) for complex flavors.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cgeesrworkDOUBLE PRECISION for zgeesWorkspace array, DIMENSION at least max(1, n). Used incomplex flavors only.

LOGICAL. Workspace array, DIMENSION at least max(1, n).Not referenced if sort = 'N'.

bwork

Output Parameters

On exit, this array is overwritten by the real-Schur/Schurform T.

a

INTEGER.sdimIf sort = 'N', sdim= 0.If sort = 'S', sdim is equal to the number of eigenvalues(after sorting) for which select is true.Note that for real flavors complex conjugate pairs for whichselect is true for either eigenvalue count as 2.

REAL for sgeeswr, wiDOUBLE PRECISION for dgeesArrays, DIMENSION at least max (1, n) each. Contain thereal and imaginary parts, respectively, of the computedeigenvalues, in the same order that they appear on thediagonal of the output real-Schur form T. Complex conjugatepairs of eigenvalues appear consecutively with theeigenvalue having positive imaginary part first.

COMPLEX for cgeeswDOUBLE COMPLEX for zgees.Array, DIMENSION at least max(1, n). Contains the computedeigenvalues. The eigenvalues are stored in the same orderas they appear on the diagonal of the output Schur form T.

1019


REAL for sgeesvsDOUBLE PRECISION for dgeesCOMPLEX for cgeesDOUBLE COMPLEX for zgees.Array vs(ldvs,*);the second dimension of vs must be atleast max(1, n).If jobvs = 'V', vs contains the orthogonal/unitary matrixZ of Schur vectors.If jobvs = 'N', vs is not referenced.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, and

i ≤ n:the QR algorithm failed to compute all the eigenvalues;elements 1:ilo-1 and i+1:n of wr and wi (for real flavors)or w (for complex flavors) contain those eigenvalues whichhave converged; if jobvs = 'V', vs contains the matrixwhich reduces A to its partially converged Schur form;i = n+1:the eigenvalues could not be reordered because someeigenvalues were too close to separate (the problem is veryill-conditioned);i = n+2:after reordering, round-off changed values of some complexeigenvalues so that leading eigenvalues in the Schur formno longer satisfy select = .TRUE.. This could also becaused by underflow due to scaling.



Specific details for the routine gees interface are the following:

1020






Holds the matrix VS of size (n, n).vs

Restored based on the presence of the argument vs as follows:jobvsjobvs = 'V', if vs is present,jobvs = 'N', if vs is omitted.

Restored based on the presence of the argument select as follows:sortsort = 'S', if select is present,sort = 'N', if select is omitted.

Application Notes





1021


?geesxComputes the eigenvalues and Schur factorizationof a general matrix, orders the factorization andcomputes reciprocal condition numbers.

Syntax

Fortran 77:

call sgeesx(jobvs, sort, select, sense, n, a, lda, sdim, wr, wi, vs, ldvs,rconde, rcondv, work, lwork, iwork, liwork, bwork, info)

call dgeesx(jobvs, sort, select, sense, n, a, lda, sdim, wr, wi, vs, ldvs,rconde, rcondv, work, lwork, iwork, liwork, bwork, info)

call cgeesx(jobvs, sort, select, sense, n, a, lda, sdim, w, vs, ldvs, rconde,rcondv, work, lwork, rwork, bwork, info)

call zgeesx(jobvs, sort, select, sense, n, a, lda, sdim, w, vs, ldvs, rconde,rcondv, work, lwork, rwork, bwork, info)

Fortran 95:

call geesx(a, wr, wi [,vs] [,select] [,sdim] [,rconde] [,rcondev] [,info])

call geesx(a, w [,vs] [,select] [,sdim] [,rconde] [,rcondev] [,info])

Description

This routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, thereal-Schur/Schur form T, and, optionally, the matrix of Schur vectors Z. This gives the Schurfactorization A = Z*T*ZH.

Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so thatselected eigenvalues are at the top left; computes a reciprocal condition number for the averageof the selected eigenvalues (rconde); and computes a reciprocal condition number for the rightinvariant subspace corresponding to the selected eigenvalues (rcondv). The leading columnsof Z form an orthonormal basis for this invariant subspace.

For further explanation of the reciprocal condition numbers rconde and rcondv, see [LUG],Section 4.10 (where these quantities are called s and sep respectively).

A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks.2-by-2 blocks will be standardized in the form

1022


where b*c < 0. The eigenvalues of such a block are

A complex matrix is in Schur form if it is upper triangular.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvsIf jobvs = 'N', then Schur vectors are not computed.If jobvs = 'V', then Schur vectors are computed.

CHARACTER*1. Must be 'N' or 'S'. Specifies whether ornot to order the eigenvalues on the diagonal of the Schurform.

sort

If sort = 'N', then eigenvalues are not ordered.If sort = 'S', eigenvalues are ordered (see select).

LOGICAL FUNCTION of two REAL arguments for real flavors.selectLOGICAL FUNCTION of one COMPLEX argument for complexflavors.select must be declared EXTERNAL in the calling subroutine.If sort = 'S', select is used to select eigenvalues to sortto the top left of the Schur form.If sort = 'N', select is not referenced.For real flavors:An eigenvalue wr(j)+sqrt(-1)*wi(j) is selected ifselect(wr(j), wi(j)) is true; that is, if either one of acomplex conjugate pair of eigenvalues is selected, then bothcomplex eigenvalues are selected.Note that a selected complex eigenvalue may no longersatisfy select(wr(j), wi(j)) = .TRUE. after ordering, sinceordering may change the value of complex eigenvalues(especially if the eigenvalue is ill-conditioned); in this caseinfo may be set to n+2 (see info below).For complex flavors:

1023

An eigenvalue w(j) is selected if select(w(j)) is true.

CHARACTER*1. Must be 'N', 'E', 'V', or 'B'. Determineswhich reciprocal condition number are computed.

sense

If sense = 'N', none are computed;If sense = 'E', computed for average of selectedeigenvalues only;If sense = 'V', computed for selected right invariantsubspace only;If sense = 'B', computed for both.If sense is 'E', 'V', or 'B', then sort must equal 'S'.


REAL for sgeesxa, workDOUBLE PRECISION for dgeesxCOMPLEX for cgeesxDOUBLE COMPLEX for zgeesx.Arrays:a(lda,*) is an array containing the n-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

INTEGER. The leading dimension of the output array vs.Constraints:

ldvs

ldvs ≥ 1;

ldvs ≥ max(1, n)if jobvs = 'V'.

INTEGER.lworkThe dimension of the array work. Constraint:

lwork ≥ max(1, 3n) for real flavors;

lwork ≥ max(1, 2n) for complex flavors.Also, if sense = 'E', 'V', or 'B', then

lwork ≥ n+2*sdim*(n-sdim) for real flavors;

lwork ≥ 2*sdim*(n-sdim) for complex flavors;where sdim is the number of selected eigenvalues computedby this routine.

1024


Note that 2*sdim*(n-sdim) ≤ n*n/2. Note also that anerror is only returned if lwork<max(1, 2*n), but if sense= 'E', or 'V', or 'B' this may not be large enough.For good performance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates upper bound on the optimal size ofthe array work, returns this value as the first entry of thework array, and no error message related to lwork is issuedby xerbla.

INTEGER.iworkWorkspace array, DIMENSION (liwork). Used in real flavorsonly. Not referenced if sense = 'N' or 'E'.

INTEGER.liworkThe dimension of the array iwork. Used in real flavors only.Constraint:

liwork ≥ 1;

if sense = 'V' or 'B', liwork ≥ sdim*(n-sdim).

REAL for cgeesxrworkDOUBLE PRECISION for zgeesxWorkspace array, DIMENSION at least max(1, n). Used incomplex flavors only.

LOGICAL. Workspace array, DIMENSION at least max(1, n).Not referenced if sort = 'N'.

bwork

Output Parameters

On exit, this array is overwritten by the real-Schur/Schurform T.

a

INTEGER.sdimIf sort = 'N', sdim= 0.If sort = 'S', sdim is equal to the number of eigenvalues(after sorting) for which select is true.Note that for real flavors complex conjugate pairs for whichselect is true for either eigenvalue count as 2.

REAL for sgeesxwr, wiDOUBLE PRECISION for dgeesx

1025

Arrays, DIMENSION at least max (1, n) each. Contain thereal and imaginary parts, respectively, of the computedeigenvalues, in the same order that they appear on thediagonal of the output real-Schur form T. Complex conjugatepairs of eigenvalues appear consecutively with theeigenvalue having positive imaginary part first.

COMPLEX for cgeesxwDOUBLE COMPLEX for zgeesx.Array, DIMENSION at least max(1, n). Contains the computedeigenvalues. The eigenvalues are stored in the same orderas they appear on the diagonal of the output Schur form T.

REAL for sgeesxvsDOUBLE PRECISION for dgeesxCOMPLEX for cgeesxDOUBLE COMPLEX for zgeesx.Array vs(ldvs,*);the second dimension of vs must be atleast max(1, n).If jobvs = 'V', vs contains the orthogonal/unitary matrixZ of Schur vectors.If jobvs = 'N', vs is not referenced.

REAL for single precision flavors DOUBLE PRECISION fordouble precision flavors.

rconde, rcondv

If sense = 'E' or 'B', rconde contains the reciprocalcondition number for the average of the selectedeigenvalues.If sense = 'N' or 'V', rconde is not referenced.If sense = 'V' or 'B', rcondv contains the reciprocalcondition number for the selected right invariant subspace.If sense = 'N' or 'E', rcondv is not referenced.


work(1)


i ≤ n:

1026


the QR algorithm failed to compute all the eigenvalues;elements 1:ilo-1 and i+1:n of wr and wi (for real flavors)or w (for complex flavors) contain those eigenvalues whichhave converged; if jobvs = 'V', vs contains thetransformation which reduces A to its partially convergedSchur form;i = n+1:the eigenvalues could not be reordered because someeigenvalues were too close to separate (the problem is veryill-conditioned);i = n+2:after reordering, roundoff changed values of some complexeigenvalues so that leading eigenvalues in the Schur formno longer satisfy select = .TRUE.. This could also becaused by underflow due to scaling.



Specific details for the routine geesx interface are the following:





Holds the matrix VS of size (n, n).vs

Restored based on the presence of the argument vs as follows:jobvsjobvs = 'V', if vs is present,jobvs = 'N', if vs is omitted.


Restored based on the presence of arguments rconde and rcondv asfollows:

sense

sense = 'B', if both rconde and rcondv are present,sense = 'E', if rconde is present and rcondv omitted,

1027


sense = 'V', if rconde is omitted and rcondv present,sense = 'N', if both rconde and rcondv are omitted.

Application Notes





?geevComputes the eigenvalues and left and righteigenvectors of a general matrix.

Syntax

Fortran 77:

call sgeev(jobvl, jobvr, n, a, lda, wr, wi, vl, ldvl, vr, ldvr, work, lwork,info)

call dgeev(jobvl, jobvr, n, a, lda, wr, wi, vl, ldvl, vr, ldvr, work, lwork,info)

call cgeev(jobvl, jobvr, n, a, lda, w, vl, ldvl, vr, ldvr, work, lwork, rwork,info)

call zgeev(jobvl, jobvr, n, a, lda, w, vl, ldvl, vr, ldvr, work, lwork, rwork,info)

1028


Fortran 95:

call geev(a, wr, wi [,vl] [,vr] [,info])

call geev(a, w [,vl] [,vr] [,info])

Description

This routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and,optionally, the left and/or right eigenvectors. The right eigenvector v(j) of A satisfies

A*v(j)= λ(j)*v(j)

where λ(j) is its eigenvalue.

The left eigenvector u(j) of A satisfies

u(j)H*A = λ(j)*u(j)H

where u(j)H denotes the conjugate transpose of u(j). The computed eigenvectors are normalizedto have Euclidean norm equal to 1 and largest component real.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvlIf jobvl = 'N', then left eigenvectors of A are notcomputed.If jobvl = 'V', then left eigenvectors of A are computed.

CHARACTER*1. Must be 'N' or 'V'.jobvrIf jobvr = 'N', then right eigenvectors of A are notcomputed.If jobvr = 'V', then right eigenvectors of A are computed.


REAL for sgeeva, workDOUBLE PRECISION for dgeevCOMPLEX for cgeevDOUBLE COMPLEX for zgeev.Arrays:a(lda,*) is an array containing the n-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

1029


lda

INTEGER. The leading dimensions of the output arrays vland vr, respectively.

ldvl, ldvr

Constraints:

ldvl ≥ 1; ldvr ≥ 1.

If jobvl = 'V', ldvl ≥ max(1, n);

If jobvr = 'V', ldvr ≥ max(1, n).

INTEGER.lworkThe dimension of the array work.Constraint:

lwork ≥ max(1, 3n), and if jobvl = 'V' or jobvr ='V', lwork < max(1, 4n) (for real flavors);lwork < max(1, 2n) (for complex flavors).For good performance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cgeevrworkDOUBLE PRECISION for zgeevWorkspace array, DIMENSION at least max(1, 2n). Used incomplex flavors only.

Output Parameters

On exit, this array is overwritten by intermediate results.a

REAL for sgeevwr, wiDOUBLE PRECISION for dgeevArrays, DIMENSION at least max (1, n) each.Contain the real and imaginary parts, respectively, of thecomputed eigenvalues. Complex conjugate pairs ofeigenvalues appear consecutively with the eigenvalue havingpositive imaginary part first.

COMPLEX for cgeevwDOUBLE COMPLEX for zgeev.

1030

Array, DIMENSION at least max(1, n).Contains the computed eigenvalues.

REAL for sgeevvl, vrDOUBLE PRECISION for dgeevCOMPLEX for cgeevDOUBLE COMPLEX for zgeev.Arrays:vl(ldvl,*);the second dimension of vl must be at leastmax(1, n).If jobvl = 'V', the left eigenvectors u(j) are stored oneafter another in the columns of vl, in the same order astheir eigenvalues.If jobvl = 'N', vl is not referenced.For real flavors:If the j-th eigenvalue is real, then u(j) = vl(:,j), thej-th column of vl.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then u(j) = vl(:,j) + i*vl(:,j+1)and u(j+1) = vl(:,j)- i*vl(:,j+1), where i =sqrt(-1).For complex flavors:u(j) = vl(:,j), the j-th column of vl.vr(ldvr,*); the second dimension of vr must be at leastmax(1, n).If jobvr = 'V', the right eigenvectors v(j) are stored oneafter another in the columns of vr, in the same order astheir eigenvalues.If jobvr = 'N', vr is not referenced.For real flavors:If the j-th eigenvalue is real, then v(j) = vr(:,j), thej-th column of vr.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then v(j) = vr(:,j) + i*vr(:,j+1)and v(j+1) = vr(:,j) - i*vr(:,j+1), where i =sqrt(-1).For complex flavors:v(j) = vr(:,j), the j-th column of vr.

1031



work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, the QR algorithm failed to compute all theeigenvalues, and no eigenvectors have been computed;elements i+1:n of wr and wi (for real flavors) or w (forcomplex flavors) contain those eigenvalues which haveconverged.



Specific details for the routine geev interface are the following:





Holds the matrix VL of size (n, n).vl

Holds the matrix VR of size (n, n).vr

Restored based on the presence of the argument vl as follows:jobvljobvl = 'V', if vl is present,jobvl = 'N', if vl is omitted.

Restored based on the presence of the argument vr as follows:jobvrjobvr = 'V', if vr is present,jobvr = 'N', if vr is omitted.

Application Notes


1032





?geevxComputes the eigenvalues and left and righteigenvectors of a general matrix, with preliminarymatrix balancing, and computes reciprocalcondition numbers for the eigenvalues and righteigenvectors.

Syntax

Fortran 77:

call sgeevx(balanc, jobvl, jobvr, sense, n, a, lda, wr, wi, vl, ldvl, vr,ldvr, ilo, ihi, scale, abnrm, rconde, rcondv, work, lwork, iwork, info)

call dgeevx(balanc, jobvl, jobvr, sense, n, a, lda, wr, wi, vl, ldvl, vr,ldvr, ilo, ihi, scale, abnrm, rconde, rcondv, work, lwork, iwork, info)

call cgeevx(balanc, jobvl, jobvr, sense, n, a, lda, w, vl, ldvl, vr, ldvr,ilo, ihi, scale, abnrm, rconde, rcondv, work, lwork, rwork, info)

call zgeevx(balanc, jobvl, jobvr, sense, n, a, lda, w, vl, ldvl, vr, ldvr,ilo, ihi, scale, abnrm, rconde, rcondv, work, lwork, rwork, info)

Fortran 95:

call geevx(a, wr, wi [,vl] [,vr] [,balanc] [,ilo] [,ihi] [,scale] [,abnrm] [,rconde] [,rcondv] [,info])

call geevx(a, w [,vl] [,vr] [,balanc] [,ilo] [,ihi] [,scale] [,abnrm] [,rconde][, rcondv] [,info])

1033


Description

This routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and,optionally, the left and/or right eigenvectors.

Optionally also, it computes a balancing transformation to improve the conditioning of theeigenvalues and eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers forthe eigenvalues (rconde), and reciprocal condition numbers for the right eigenvectors (rcondv).

The right eigenvector v(j) of A satisfies

A*v(j) = λ(j)*v(j)

where λ(j) is its eigenvalue.

The left eigenvector u(j) of A satisfies

u(j)H*A = λ(j)*u(j)H

where u(j)H denotes the conjugate transpose of u(j). The computed eigenvectors arenormalized to have Euclidean norm equal to 1 and largest component real.

Balancing a matrix means permuting the rows and columns to make it more nearly uppertriangular, and applying a diagonal similarity transformation D*A*inv(D), where D is a diagonalmatrix, to make its rows and columns closer in norm and the condition numbers of its eigenvaluesand eigenvectors smaller. The computed reciprocal condition numbers correspond to the balancedmatrix. Permuting rows and columns will not change the condition numbers in exact arithmetic)but diagonal scaling will. For further explanation of balancing, see [LUG], Section 4.10.

Input Parameters

CHARACTER*1. Must be 'N', 'P', 'S', or 'B'. Indicateshow the input matrix should be diagonally scaled and/orpermuted to improve the conditioning of its eigenvalues.

balanc

If balanc = 'N', do not diagonally scale or permute;If balanc = 'P', perform permutations to make the matrixmore nearly upper triangular. Do not diagonally scale;If balanc = 'S', diagonally scale the matrix, i.e. replaceA by D*A*inv(D), where D is a diagonal matrix chosen tomake the rows and columns of A more equal in norm. Donot permute;If balanc = 'B', both diagonally scale and permute A.

1034


Computed reciprocal condition numbers will be for the matrixafter balancing and/or permuting. Permuting does notchange condition numbers (in exact arithmetic), butbalancing does.

CHARACTER*1. Must be 'N' or 'V'.jobvlIf jobvl = 'N', left eigenvectors of A are not computed;If jobvl = 'V', left eigenvectors of A are computed.If sense = 'E' or 'B', then jobvl must be 'V'.

CHARACTER*1. Must be 'N' or 'V'.jobvrIf jobvr = 'N', right eigenvectors of A are not computed;If jobvr = 'V', right eigenvectors of A are computed.If sense = 'E' or 'B', then jobvr must be 'V'.


sense

If sense = 'N', none are computed;If sense = 'E', computed for eigenvalues only;If sense = 'V', computed for right eigenvectors only;If sense = 'B', computed for eigenvalues and righteigenvectors.If sense is 'E' or 'B', both left and right eigenvectors mustalso be computed (jobvl = 'V' and jobvr = 'V').


REAL for sgeevxa, workDOUBLE PRECISION for dgeevxCOMPLEX for cgeevxDOUBLE COMPLEX for zgeevx.Arrays:a(lda,*) is an array containing the n-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

INTEGER. The leading dimensions of the output arrays vland vr, respectively.

ldvl, ldvr

Constraints:

ldvl ≥ 1; ldvr ≥ 1.

1035


If jobvl = 'V', ldvl ≥ max(1, n);

If jobvr = 'V', ldvr ≥ max(1, n).

INTEGER.lworkThe dimension of the array work.For real flavors:

If sense = 'N' or 'E', lwork ≥ max(1, 2n), and if jobvl

= 'V' or jobvr = 'V', lwork ≥ 3n;

If sense = 'V' or 'B', lwork ≥ n*(n+6).For good performance, lwork must generally be larger.For complex flavors:

If sense = 'N'or 'E', lwork ≥ max(1, 2n);

If sense = 'V' or 'B', lwork ≥ n2+2n. For goodperformance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cgeevxrworkDOUBLE PRECISION for zgeevxWorkspace array, DIMENSION at least max(1, 2n). Used incomplex flavors only.

INTEGER.iworkWorkspace array, DIMENSION at least max(1, 2n-2). Usedin real flavors only. Not referenced if sense = 'N' or 'E'.

Output Parameters

On exit, this array is overwritten.aIf jobvl = 'V' or jobvr = 'V', it contains thereal-Schur/Schur form of the balanced version of the inputmatrix A.

REAL for sgeevxwr, wiDOUBLE PRECISION for dgeevx

1036


Arrays, DIMENSION at least max (1, n) each. Contain thereal and imaginary parts, respectively, of the computedeigenvalues. Complex conjugate pairs of eigenvalues appearconsecutively with the eigenvalue having positive imaginarypart first.

COMPLEX for cgeevxwDOUBLE COMPLEX for zgeevx.Array, DIMENSION at least max(1, n). Contains the computedeigenvalues.

REAL for sgeevxvl, vrDOUBLE PRECISION for dgeevxCOMPLEX for cgeevxDOUBLE COMPLEX for zgeevx.Arrays:vl(ldvl,*); the second dimension of vl must be at leastmax(1, n).If jobvl = 'V', the left eigenvectors u(j) are stored oneafter another in the columns of vl, in the same order astheir eigenvalues.If jobvl = 'N', vl is not referenced.For real flavors:If the j-th eigenvalue is real, then u(j) = vl(:,j), thej-th column of vl.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then u(j) = vl(:,j) + i*vl(:,j+1)and (j+1) = vl(:,j) - i*vl(:,j+1), where i =sqrt(-1).For complex flavors:u(j) = vl(:,j), the j-th column of vl.vr(ldvr,*); the second dimension of vr must be at leastmax(1, n).If jobvr = 'V', the right eigenvectors v(j) are stored oneafter another in the columns of vr, in the same order astheir eigenvalues.If jobvr = 'N', vr is not referenced.For real flavors:If the j-th eigenvalue is real, then v(j) = vr(:,j), thej-th column of vr.

1037


If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then v(j) = vr(:,j) + i*vr(:,j+1)and v(j+1) = vr(:,j) - i*vr(:,j+1), where i =sqrt(-1) .For complex flavors:v(j) = vr(:,j), the j-th column of vr.

INTEGER. ilo and ihi are integer values determined whenA was balanced.

ilo, ihi

The balanced A(i,j) = 0 if i > j and j = 1,..., ilo-1or i = ihi+1,..., n.If balanc = 'N' or 'S', ilo = 1 and ihi = n.

REAL for single-precision flavorsscaleDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, n). Details of thepermutations and scaling factors applied when balancing A.If P(j) is the index of the row and column interchanged withrow and column j, and D(j) is the scaling factor applied torow and column j, thenscale(j) = P(j), for j = 1,...,ilo-1= D(j), for j = ilo,...,ihi= P(j) for j = ihi+1,..., n.The order in which the interchanges are made is n to ihi+1,then 1 to ilo-1.

REAL for single-precision flavorsabnrmDOUBLE PRECISION for double-precision flavors.The one-norm of the balanced matrix (the maximum of thesum of absolute values of elements of any column).


rconde, rcondv

Arrays, DIMENSION at least max(1, n) each.rconde(j) is the reciprocal condition number of the j-theigenvalue.rcondv(j) is the reciprocal condition number of the j-th righteigenvector.


work(1)

INTEGER.info

1038


If info = 0, the execution is successful.If info = -i, the ith parameter had an illegal value.If info = i, the QR algorithm failed to compute all theeigenvalues, and no eigenvectors or condition numbers havebeen computed; elements 1:ilo-1 and i+1:n of wr and wi(for real flavors) or w (for complex flavors) containeigenvalues which have converged.



Specific details for the routine geevx interface are the following:








Holds the vector of length (n).rconde

Holds the vector of length (n).rcondv

Must be 'N', 'B', 'P' or 'S'. The default value is 'N'.balanc




sense

sense = 'B', if both rconde and rcondv are present,sense = 'E', if rconde is present and rcondv omitted,sense = 'V', if rconde is omitted and rcondv present,

1039


sense = 'N', if both rconde and rcondv are omitted.

Application Notes






This section describes LAPACK driver routines used for solving singular value problems. Seealso computational routines computational routines that can be called to solve these problems.Table 4-12 lists all such driver routines for Fortran-77 interface. Respective routine names inFortran-95 interface are without the first symbol (see Routine Naming Conventions).

Table 4-12 Driver Routines for Singular Value Decomposition


Computes the singular value decomposition of a general rectangular matrix.?gesvd

Computes the singular value decomposition of a general rectangular matrixusing a divide and conquer method.

?gesdd

Computes the generalized singular value decomposition of a pair of generalrectangular matrices.

?ggsvd

1040


?gesvdComputes the singular value decomposition of ageneral rectangular matrix.

Syntax

Fortran 77:

call sgesvd(jobu, jobvt, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, info)

call dgesvd(jobu, jobvt, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, info)

call cgesvd(jobu, jobvt, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, rwork,info)

call zgesvd(jobu, jobvt, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, rwork,info)

Fortran 95:

call gesvd(a, s [,u] [,vt] [,ww] [,job] [,info])

Description

This routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrixA, optionally computing the left and/or right singular vectors. The SVD is written

A = U*ΣVH

where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is anm-by-m orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal

elements of Σ are the singular values of A; they are real and non-negative, and are returnedin descending order. The first min(m, n) columns of U and V are the left and right singular vectorsof A.

Note that the routine returns VH, not V.

Input Parameters

CHARACTER*1. Must be 'A', 'S', 'O', or 'N'. Specifiesoptions for computing all or part of the matrix U.

jobu

If jobu = 'A', all m columns of U are returned in the arrayu;

1041


if jobu = 'S', the first min(m, n) columns of U (the leftsingular vectors) are returned in the array u;if jobu = 'O', the first min(m, n) columns of U (the leftsingular vectors) are overwritten on the array a;if jobu = 'N', no columns of U (no left singular vectors)are computed.

CHARACTER*1. Must be 'A', 'S', 'O', or 'N'. Specifiesoptions for computing all or part of the matrix VH.

jobvt

If jobvt = 'A', all n rows of VH are returned in the arrayvt;if jobvt = 'S', the first min(m,n) rows of VH (the rightsingular vectors) are returned in the array vt;if jobvt = 'O', the first min(m,n) rows of VH (the rightsingular vectors) are overwritten on the array a;if jobvt = 'N', no rows of VH (no right singular vectors)are computed.jobvt and jobu cannot both be 'O'.



REAL for sgesvda, workDOUBLE PRECISION for dgesvdCOMPLEX for cgesvdDOUBLE COMPLEX for zgesvd.Arrays:a(lda,*) is an array containing the m-by-n matrix A.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of the array a.ldaMust be at least max(1, m).

INTEGER. The leading dimensions of the output arrays uand vt, respectively.

ldu, ldvt

Constraints:

ldu ≥ 1; ldvt < 1.

If jobu = 'S' or 'A', ldu ≥ m;

If jobvt = 'A', ldvt≥ n;

1042

If jobvt = 'S', ldvt ≥ min(m, n).

INTEGER.lwork

The dimension of the array work; lwork ≥ 1.Constraints:

lwork ≥ max(3*min(m, n)+max(m, n), 5*min(m,n))(for real flavors);

lwork ≥ 2*min(m, n)+max(m, n) (for complex flavors).For good performance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla. SeeApplication Notes for details.

REAL for cgesvdrworkDOUBLE PRECISION for zgesvdWorkspace array, DIMENSION at least max(1, 5*min(m, n)).Used in complex flavors only.

Output Parameters

On exit,aIf jobu = 'O', a is overwritten with the first min(m,n)columns of U (the left singular vectors, stored columnwise);If jobvt = 'O', a is overwritten with the first min(m, n)rows of VH (the right singular vectors, stored rowwise);

If jobu≠'O' and jobvt≠'O', the contents of a aredestroyed.


s

Array, DIMENSION at least max(1, min(m,n)). Contains thesingular values of A sorted so that s(i) < s(i+1).

REAL for sgesvdu, vtDOUBLE PRECISION for dgesvdCOMPLEX for cgesvdDOUBLE COMPLEX for zgesvd.Arrays:

1043

u(ldu,*); the second dimension of u must be at least max(1,m) if jobu = 'A', and at least max(1, min(m, n)) if jobu= 'S'.If jobu = 'A', u contains the m-by-m orthogonal/unitarymatrix U.If jobu = 'S', u contains the first min(m, n) columns of U(the left singular vectors, stored columnwise).If jobu = 'N' or 'O', u is not referenced.vt(ldvt,*); the second dimension of vt must be at leastmax(1, n).If jobvt = 'A', vt contains the n-by-n orthogonal/unitarymatrix VH.If jobvt = 'S', vt contains the first min(m, n) rows of VH

(the right singular vectors, stored rowwise).If jobvt = 'N'or 'O', vt is not referenced.


work

For real flavors:If info > 0, work(2:min(m,n)) contains the unconvergedsuperdiagonal elements of an upper bidiagonal matrix Bwhose diagonal is in s (not necessarily sorted). B satisfiesA = u * B * vt, so it has the same singular values as A,and singular vectors related by u and vt.

On exit (for complex flavors), if info > 0,rwork(1:min(m,n)-1) contains the unconvergedsuperdiagonal elements of an upper bidiagonal matrix B

rwork

whose diagonal is in s (not necessarily sorted). B satisfiesA = u * B * vt, so it has the same singular values as A,and singular vectors related by u and vt.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then if ?bdsqr did not converge, i specifieshow many superdiagonals of the intermediate bidiagonalform B did not converge to zero.

1044




Specific details for the routine gesvd interface are the following:


Holds the vector of length min(m, n).s

Holds the matrix U of size (m,min(m, n)).u

Holds the matrix VT of size (min(m, n),n).vt

Holds the vector of length (min(m, n)-1).ww

Holds the vector of length min(m, n)-1. ww contains the unconvergedsuperdiagonal elements of an upper bidiagonal matrix B whose diagonalis in s (not necessarily sorted). B satisfies A = U * B * VT, so it hasthe same singular values as A, and singular vectors related by U andVT.

ww

Must be either 'N', or 'U', or 'V'. The default value is 'N'.jobIf job = 'U', and u is not present, then u is returned in the array a.If job = 'V', and vt is not present, then vt is returned in the arraya.

Restored based on the presence of the argument u, value of job andsizes of arrays u and a as follows:

jobu

jobu = 'A', if u is present and the number of columns in u is equalto the number of rows in a,jobu = 'S', if u is present and the number of columns in u is not equalto the number of rows in a,jobu = 'O', if u is not present and job is equal to 'U',jobu = 'N', if u is not present and job is not equal to 'U'.

Restored based on the presence of the argument vt, value of job andsizes of arrays vt and a as follows:

jobvt

jobvt = 'A', if vt is present and the number of columns in vt is equalto the number of rows in a,jobvt = 'S', if vt is present and the number of columns in vt is notequal to the number of rows in a,jobvt = 'O', if vt is not present and job is equal to 'V',jobvt = 'N', if vt is not present and job is not equal to 'V',

1045


Application Notes





?gesddComputes the singular value decomposition of ageneral rectangular matrix using a divide andconquer method.

Syntax

Fortran 77:

call sgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, iwork, info)

call dgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, iwork, info)

call cgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, rwork, iwork,info)

call zgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, rwork, iwork,info)

Fortran 95:

call gesdd(a, s [,u] [,vt] [,jobz] [,info])

Description

This routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrixA, optionally computing the left and/or right singular vectors.

1046


If singular vectors are desired, it uses a divide and conquer algorithm. The SVD is written

A = U*ΣVH,

where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is anm-by-m orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal

elements of Σ are the singular values of A; they are real and non-negative, and are returnedin descending order. The first min(m, n) columns of U and V are the left and right singular vectorsof A.

Note that the routine returns VH, not V.

Input Parameters

CHARACTER*1. Must be 'A', 'S', 'O', or 'N'.jobzSpecifies options for computing all or part of the matrix U.If jobz = 'A', all m columns of U and all n rows of VT arereturned in the arrays u and vt;if jobz = 'S', the first min(m, n) columns of U and the firstmin(m, n) rows of VT are returned in the arrays u and vt;if jobz = 'O', then

if m ≥ n, the first n columns of U are overwritten in the arraya and all rows of VT are returned in the array vt;

if m ≥ n, all columns of U are returned in the array u and thefirst m rows of VT are overwritten in the array a;if jobz = 'N', no columns of U or rows of VT are computed.



REAL for sgesdda, workDOUBLE PRECISION for dgesddCOMPLEX for cgesddDOUBLE COMPLEX for zgesdd.Arrays: a(lda,*) is an array containing the m-by-n matrixA.The second dimension of a must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).

INTEGER. The first dimension of the array a. Must be atleast max(1, m).

lda

1047


INTEGER. The leading dimensions of the output arrays uand vt, respectively.

ldu, ldvt

Constraints:

ldu ≥ 1; ldvt ≥ 1.If jobz = 'S' or 'A', or jobz = 'O' and m < n,

then ldu ≥ m;If jobz = 'A' or jobz = 'O' and m < n,

then ldvt ≥ n;

If jobz = 'S', ldvt ≥ min(m, n).

INTEGER.lwork

The dimension of the array work; lwork ≥ 1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the work(1), and no error messagerelated to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for cgesddrworkDOUBLE PRECISION for zgesddWorkspace array, DIMENSION at least max(1, 5*min(m,n))if jobz = 'N'.Otherwise, the dimension of rwork must be at least max(1,5*(min(m,n))2 + 7*min(m,n)). This array is used incomplex flavors only.

INTEGER. Workspace array, DIMENSION at least max(1, 8*min(m, n)).

iwork

Output Parameters

On exit:a

If jobz = 'O', then if m≥ n, a is overwritten with the firstn columns of U (the left singular vectors, stored columnwise).If m < n, a is overwritten with the first m rows of VT (the rightsingular vectors, stored rowwise);

If jobz≠'O', the contents of a are destroyed.


s

1048

Array, DIMENSION at least max(1, min(m,n)). Contains the

singular values of A sorted so that s(i) ≥ s(i+1).

REAL for sgesddu, vtDOUBLE PRECISION for dgesddCOMPLEX for cgesddDOUBLE COMPLEX for zgesdd.Arrays:u(ldu,*); the second dimension of u must be at least max(1,m) if jobz = 'A' or jobz = 'O' and m < n.If jobz = 'S', the second dimension of u must be at leastmax(1, min(m, n)).If jobz = 'A'or jobz = 'O' and m < n, u contains them-by-m orthogonal/unitary matrix U.If jobz = 'S', u contains the first min(m, n) columns of U(the left singular vectors, stored columnwise).If jobz = 'O' and m < n, or jobz = 'N', u is notreferenced.vt(ldvt,*); the second dimension of vt must be at leastmax(1, n).

If jobz = 'A'or jobz = 'O' and m≥n, vt contains then-by-n orthogonal/unitary matrix VT.If jobz = 'S', vt contains the first min(m, n) rows of VT

(the right singular vectors, stored rowwise).If jobz = 'O' and m < n, or jobz = 'N', vt is notreferenced.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, then ?bdsdc did not converge, updatingprocess failed.



1049

Specific details for the routine gesdd interface are the following:


Holds the vector of length min(m, n).s

Holds the matrix U of size (m,min(m, n)).u

Holds the matrix VT of size (min(m, n),n).vt

Must be 'N', 'A', 'S', or 'O'. The default value is 'N'.job

Application Notes

For real flavors:

If jobz = 'N', lwork ≥ 3*min(m, n) + max (max(m,n), 6*min(m, n));

If jobz = 'O', lwork ≥ 3*(min(m, n))2 +

max (max(m, n), 5*(min(m, n))2 + 4*min(m, n));

If jobz = 'S' or 'A', lwork < 3*(min(m, n))2 +

max (max(m, n), 4*(min(m, n))2 + 4*min(m, n)).


If jobz = 'N', lwork ≥ 2*min(m, n) + max(m, n);

If jobz = 'O', lwork ≥ 2*(min(m, n))2 + max(m, n) + 2*min(m, n);

If jobz = 'S' or 'A', lwork ≥ (min(m, n))2 + max(m, n) + 2*min(m, n);





1050


?ggsvdComputes the generalized singular valuedecomposition of a pair of general rectangularmatrices.

Syntax

Fortran 77:

call sggsvd(jobu, jobv, jobq, m, n, p, k, l, a, lda, b, ldb, alpha, beta, u,ldu, v, ldv, q, ldq, work, iwork, info)

call dggsvd(jobu, jobv, jobq, m, n, p, k, l, a, lda, b, ldb, alpha, beta, u,ldu, v, ldv, q, ldq, work, iwork, info)

call cggsvd(jobu, jobv, jobq, m, n, p, k, l, a, lda, b, ldb, alpha, beta, u,ldu, v, ldv, q, ldq, work, rwork, iwork, info)

call zggsvd(jobu, jobv, jobq, m, n, p, k, l, a, lda, b, ldb, alpha, beta, u,ldu, v, ldv, q, ldq, work, rwork, iwork, info)

Fortran 95:

call ggsvd(a, b, alpha, beta [, k] [,l] [,u] [,v] [,q] [,iwork] [,info])

Description

This routine computes the generalized singular value decomposition (GSVD) of an m-by-nreal/complex matrix A and p-by-n real/complex matrix B:

UH*A*Q = D1*(0 R), VH*B*Q = D2*(0 R),

where U, V and Q are orthogonal/unitary matrices.

Let k+l = the effective numerical rank of the matrix (AH, BH)H, then R is a (k+l)-by-(k+l)nonsingular upper triangular matrix, D1 and D2 are m-by-(k+l) and p-by-(k+l) “diagonal”matrices and of the following structures, respectively:

If m-k-l ≥0,

1051


where

C = diag(alpha(K+1),..., alpha(K+l))

S = diag(beta(K+1),...,beta(K+l))

C2 + S2 = I

R is stored in a(1:k+l, n-k-l+1:n ) on exit.

If m-k-l < 0,

1052

where

C = diag(alpha(K+1),..., alpha(m)),

S = diag(beta(K+1),...,beta(m)),

C2 + S2 = I

On exit, is stored in a(1:m, n-k-l+1:n ) and R33 is stored in b(m-k+1:l,n+m-k-l+1:n ).

The routine computes C, S, R, and optionally the orthogonal/unitary transformation matricesU, V and Q.

1053


In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly givesthe SVD of AB-1:

A*B-1 = U(*D1 D2-1)*VH.

If (AH, BH)H has orthonormal columns, then the GSVD of A and B is also equal to the CSdecomposition of A and B. Furthermore, the GSVD can be used to derive the solution of theeigenvalue problem:

AH*A λx = λ BH*B x.

Input Parameters

CHARACTER*1. Must be 'U' or 'N'.jobuIf jobu = 'U', orthogonal/unitary matrix U is computed.If jobu = 'N', U is not computed.

CHARACTER*1. Must be 'V' or 'N'.jobvIf jobv = 'V', orthogonal/unitary matrix V is computed.If jobv = 'N', V is not computed.

CHARACTER*1. Must be 'Q' or 'N'.jobqIf jobq = 'Q', orthogonal/unitary matrix Q is computed.If jobq = 'N', Q is not computed.



(n ≥ 0).

n


REAL for sggsvda, b, workDOUBLE PRECISION for dggsvdCOMPLEX for cggsvdDOUBLE COMPLEX for zggsvd.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).b(ldb,*) contains the p-by-n matrix B.The second dimension of b must be at least max(1, n).work(*) is a workspace array.The dimension of work must be at least max(3n, m, p)+n.

1054


INTEGER. The first dimension of the array u .ldu

ldu ≥ max(1, m) if jobu = 'U'; ldu ≥ 1 otherwise.

INTEGER. The first dimension of the array v .ldv

ldv ≥ max(1, p) if jobv = 'V'; ldv ≥ 1 otherwise.

INTEGER. The first dimension of the array q .ldq

ldq ≥ max(1, n) if jobq = 'Q'; ldq ≥ 1 otherwise.


REAL for cggsvd DOUBLE PRECISION for zggsvd.rworkWorkspace array, DIMENSION at least max(1, 2n). Used incomplex flavors only.

Output Parameters

INTEGER. On exit, k and l specify the dimension of thesubblocks. The sum k+l is equal to the effective numericalrank of (AH, BH)H.

k, l

On exit, a contains the triangular matrix R or part of R.a

On exit, b contains part of the triangular matrix R if m-k-l< 0.

b

REAL for single-precision flavorsalpha, betaDOUBLE PRECISION for double-precision flavors.Arrays, DIMENSION at least max(1, n) each.Contain the generalized singular value pairs of A and B:alpha(1:k) = 1,beta(1:k) = 0,

and if m-k-l ≥ 0,alpha(k+1:k+l) = C,beta(k+1:k+l) = S,or if m-k-l < 0,alpha(k+1:m)= C, alpha(m+1:k+l)=0beta(k+1:m) = S, beta(m+1:k+l) = 1and

1055

alpha(k+l+1:n) = 0beta(k+l+1:n) = 0.

REAL for sggsvdu, v, qDOUBLE PRECISION for dggsvdCOMPLEX for cggsvdDOUBLE COMPLEX for zggsvd.Arrays:u(ldu,*); the second dimension of u must be at least max(1,m).If jobu = 'U', u contains the m-by-m orthogonal/unitarymatrix U.If jobu = 'N', u is not referenced.v(ldv,*); the second dimension of v must be at least max(1,p).If jobv = 'V', v contains the p-by-p orthogonal/unitarymatrix V.If jobv = 'N', v is not referenced.q(ldq,*); the second dimension of q must be at least max(1,n).If jobq = 'Q', q contains the n-by-n orthogonal/unitarymatrix Q.If jobq = 'N', q is not referenced.

On exit, iwork stores the sorting information.iwork

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, the Jacobi-type procedure failed to converge.For further details, see subroutine ?tgsja.



Specific details for the routine ggsvd interface are the following:


Holds the matrix B of size (p, n).b

1056


Holds the vector of length (n).alpha


Holds the matrix U of size (m, m).u

Holds the matrix V of size (p, p).v


Holds the vector of length (n).iwork

Restored based on the presence of the argument u as follows:jobujobu = 'U', if u is present, jobu = 'N', if u is omitted.

Restored based on the presence of the argument v as follows:jobvjobz = 'V', if v is present,jobz = 'N', if v is omitted.

Restored based on the presence of the argument q as follows:jobqjobz = 'Q', if q is present,jobz = 'N', if q is omitted.

Generalized Symmetric Definite Eigenproblems

This section describes LAPACK driver routines used for solving generalized symmetric definiteeigenproblems. See also computational routines that can be called to solve these problems.Table 4-13 lists all such driver routines for Fortran-77 interface. Respective routine names inFortran-95 interface are without the first symbol (see Routine Naming Conventions).

Table 4-13 Driver Routines for Solving Generalized Symmetric Definite Eigenproblems


Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem.

?sygv/?hegv

Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem. If eigenvectorsare desired, it uses a divide and conquer method.

?sygvd/?hegvd

Computes selected eigenvalues and, optionally, eigenvectors of a real /complex generalized symmetric /Hermitian definite eigenproblem.

?sygvx/?hegvx

Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem with matricesin packed storage.

?spgv/?hpgv

1057



Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem with matricesin packed storage. If eigenvectors are desired, it uses a divide andconquer method.

?spgvd/?hpgvd

Computes selected eigenvalues and, optionally, eigenvectors of a real /complex generalized symmetric /Hermitian definite eigenproblem withmatrices in packed storage.

?spgvx/?hpgvx

Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem with bandedmatrices.

?sbgv/?hbgv

Computes all eigenvalues and, optionally, eigenvectors of a real / complexgeneralized symmetric /Hermitian definite eigenproblem with bandedmatrices. If eigenvectors are desired, it uses a divide and conquermethod.

?sbgvd/?hbgvd

Computes selected eigenvalues and, optionally, eigenvectors of a real /complex generalized symmetric /Hermitian definite eigenproblem withbanded matrices.

?sbgvx/?hbgvx

?sygvComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem.

Syntax

Fortran 77:

call ssygv(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, info)

call dsygv(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, info)

Fortran 95:

call sygv(a, b, w [,itype] [,jobz] [,uplo] [,info])

1058


Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalizedsymmetric-definite eigenproblem, of the form

Ax = λBx, ABx = λx, or B Ax = λx .

Here A and B are assumed to be symmetric and B is also positive definite.

Input Parameters

INTEGER. Must be 1 or 2 or 3.itypeSpecifies the problem type to be solved:if itype = 1, the problem type is A*x = lambda*B*x;if itype = 2, the problem type is A*B*x = lambda*x;if itype = 3, the problem type is B*A*x = lambda*x.

CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz = 'N', then compute eigenvalues only.If jobz = 'V', then compute eigenvalues and eigenvectors.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays a and b store the upper triangles ofA and B;If uplo = 'L', arrays a and b store the lower triangles ofA and B.


REAL for ssygva, b, workDOUBLE PRECISION for dsygv.Arrays:a(lda,*) contains the upper or lower triangle of thesymmetric matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of thesymmetric positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1,lwork).



1059


INTEGER.lworkThe dimension of the array work;

lwork ≥ max(1, 3n-1).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

Output Parameters

On exit, if jobz = 'V', then if info = 0, a contains thematrix Z of eigenvectors. The eigenvectors are normalizedas follows:

a

if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then on exit the upper triangle (if uplo ='U') or the lower triangle (if uplo = 'L') of A, includingthe diagonal, is destroyed.

On exit, if info ≤ n, the part of b containing the matrix isoverwritten by the triangular factor U or L from the Choleskyfactorization B = UT*U or B = L*LT.

b

REAL for ssygvwDOUBLE PRECISION for dsygv.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, spotrf/dpotrf and ssyev/dsyev returnedan error code:

If info = i ≤ n, ssyev/dsyev failed to converge, and i

off-diagonal elements of an intermediate tridiagonal did notconverge to zero;

1060


If info = n + i, for 1 ≤ i ≤ n, then the leading minorof order i of B is not positive-definite. The factorization ofB could not be completed and no eigenvalues or eigenvectorswere computed.



Specific details for the routine sygv interface are the following:


Holds the matrix B of size (n, n).b





Application Notes

For optimum performance use lwork ≥ (nb+2)*n, where nb is the blocksize for ssytrd/dsytrdreturned by ilaenv.





1061


?hegvComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem.

Syntax

Fortran 77:

call chegv(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, rwork, info)

call zhegv(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, rwork, info)

Fortran 95:

call hegv(a, b, w [,itype] [,jobz] [,uplo] [,info])

Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a complexgeneralized Hermitian-definite eigenproblem, of the form

Ax = λ Bx, ABx = λ x, or B Ax = λx .

Here A and B are assumed to be Hermitian and B is also positive definite.

Input Parameters

INTEGER. Must be 1 or 2 or 3. Specifies the problem typeto be solved:

itype

if itype = 1, the problem type is A*x = lambda*B*x;if itype = 2, the problem type is A*B*x = lambda*x;if itype = 3, the problem type is B*A*x = lambda*x.



1062



COMPLEX for chegva, b, workDOUBLE COMPLEX for zhegv.Arrays:a(lda,*) contains the upper or lower triangle of theHermitian matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of theHermitian positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



INTEGER.lwork

The dimension of the array work; lwork ≥ max(1, 2n-1).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for chegvrworkDOUBLE PRECISION for zhegv.Workspace array, DIMENSION at least max(1, 3n-2).

Output Parameters


a

if itype = 1 or 2, ZH*B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then on exit the upper triangle (if uplo ='U') or the lower triangle (if uplo = 'L') of A, includingthe diagonal, is destroyed.

1063


On exit, if info ≤ n, the part of b containing the matrix isoverwritten by the triangular factor U or L from the Choleskyfactorization B = UH*U or B = L*LH.

b

REAL for chegvwDOUBLE PRECISION for zhegv.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, cpotrf/zpotrf and cheev/zheev returnedan error code:

If info = i ≤ n, cheev/zheev failed to converge, and i





Specific details for the routine hegv interface are the following:







1064


Application Notes

For optimum performance use lwork ≥ (nb+1)*n, where nb is the blocksize for chetrd/zhetrdreturned by ilaenv.





?sygvdComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem. If eigenvectors are desired,it uses a divide and conquer method.

Syntax

Fortran 77:

call ssygvd(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, iwork,liwork, info)

call dsygvd(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, iwork,liwork, info)

Fortran 95:

call sygvd(a, b, w [,itype] [,jobz] [,uplo] [,info])

1065


Description



Here A and B are assumed to be symmetric and B is also positive definite.

If eigenvectors are desired, it uses a divide and conquer algorithm.

Input Parameters


itype





REAL for ssygvda, b, workDOUBLE PRECISION for dsygvd.Arrays:a(lda,*) contains the upper or lower triangle of thesymmetric matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of thesymmetric positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



1066


If n ≤ 1, lwork ≥ 1;If jobz = 'N' and n>1, lwork < 2n+1;If jobz = 'V' and n>1, lwork < 2n2+6n+1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

INTEGER.iworkWorkspace array, its dimension max(1, lwork).


If n ≤ 1, liwork ≥ 1;

If jobz = 'N' and n>1, liwork ≥ 1;

If jobz = 'V' and n>1, liwork ≥ 5n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters


a

if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then on exit the upper triangle (if uplo ='U') or the lower triangle (if uplo = 'L') of A, includingthe diagonal, is destroyed.

1067


b

REAL for ssygvdwDOUBLE PRECISION for dsygvd.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.


work(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.

If info > 0= i ≤ n, and jobz = 'N', then the algorithmfailed to converge; i off-diagonal elements of anintermediate tridiagonal form did not converge to zero; ifjobz = 'V', then the algorithm failed to compute aneigenvalue while working on the submatrix lying in rowsand columns info/(n+1) through mod(info,n+1);

If info > 0= n + i, for 1 ≤ i ≤ n, then the leading minorof order i of B is not positive-definite. The factorization ofB could not be completed and no eigenvalues or eigenvectorswere computed.



Specific details for the routine sygvd interface are the following:





1068




Application Notes





?hegvdComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem. If eigenvectors are desired,it uses a divide and conquer method.

Syntax

Fortran 77:

call chegvd(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, rwork,lrwork, iwork, liwork, info)

call zhegvd(itype, jobz, uplo, n, a, lda, b, ldb, w, work, lwork, rwork,lrwork, iwork, liwork, info)

Fortran 95:

call hegvd(a, b, w [,itype] [,jobz] [,uplo] [,info])

1069


Description



Here A and B are assumed to be Hermitian and B is also positive definite.


Input Parameters


itype





COMPLEX for chegvda, b, workDOUBLE COMPLEX for zhegvd.Arrays:a(lda,*) contains the upper or lower triangle of theHermitian matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of theHermitian positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



1070


If n ≤ 1, lwork ≥ 1;

If jobz = 'N' and n>1, lwork ≥ n+1;

If jobz = 'V' and n>1, lwork ≥ n2+2n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for chegvdrworkDOUBLE PRECISION for zhegvd.Workspace array, DIMENSION max(1, lrwork).


If n ≤ 1, lrwork ≥ 1;

If jobz = 'N' and n>1, lrwork ≥ n;If jobz = 'V' and n>1, lrwork < 2n2+5n+1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

INTEGER.iworkWorkspace array, DIMENSION max(1, liwork).




If jobz = 'V' and n>1, liwork ≥ 5n+3.

1071

If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

Output Parameters


a

if itype = 1 or 2, ZH* B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then on exit the upper triangle (if uplo ='U') or the lower triangle (if uplo = 'L') of A, includingthe diagonal, is destroyed.


b

REAL for chegvdwDOUBLE PRECISION for zhegvd.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.


work(1)


rwork(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info = i, and jobz = 'N', then the algorithm failed toconverge; i off-diagonal elements of an intermediatetridiagonal form did not converge to zero;

1072


if info = i, and jobz = 'V', then the algorithm failed tocompute an eigenvalue while working on the submatrix lyingin rows and columns info/(n+1) through mod(info, n+1).




Specific details for the routine hegvd interface are the following:







Application Notes





1073


?sygvxComputes selected eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem.

Syntax

Fortran 77:

call ssygvx(itype, jobz, range, uplo, n, a, lda, b, ldb, vl, vu, il, iu,abstol, m, w, z, ldz, work, lwork, iwork, ifail, info)

call dsygvx(itype, jobz, range, uplo, n, a, lda, b, ldb, vl, vu, il, iu,abstol, m, w, z, ldz, work, lwork, iwork, ifail, info)

Fortran 95:

call sygvx(a, b, w [,itype] [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalizedsymmetric-definite eigenproblem, of the form


Here A and B are assumed to be symmetric and B is also positive definite. Eigenvalues andeigenvectors can be selected by specifying either a range of values or a range of indices forthe desired eigenvalues.

Input Parameters


itype



CHARACTER*1. Must be 'A' or 'V' or 'I'.range

1074


If range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalueslambda(i) in the half-open interval:

vl<lambda(i)≤ vu.If range = 'I', the routine computes eigenvalues withindices il to iu.



REAL for ssygvxa, b, workDOUBLE PRECISION for dsygvx.Arrays:a(lda,*) contains the upper or lower triangle of thesymmetric matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of thesymmetric positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



REAL for ssygvxvl, vuDOUBLE PRECISION for dsygvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu ≤ n, if n > 0; il=1 and iu=0if n = 0.

1075

If range = 'A' or 'V', il and iu are not referenced.

REAL for ssygvxabstolDOUBLE PRECISION for dsygvx. The absolute error tolerancefor the eigenvalues. See Application Notes for moreinformation.


ldz

ldz ≥ 1; if jobz = 'V', ldz ≥ max(1, n).

INTEGER.lworkThe dimension of the array work;lwork < max(1, 8n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

INTEGER.iworkWorkspace array, DIMENSION at least max(1, 5n).

Output Parameters

On exit, the upper triangle (if uplo = 'U') or the lowertriangle (if uplo = 'L') of A, including the diagonal, isoverwritten.

a


b



REAL for ssygvxw, zDOUBLE PRECISION for dsygvx.Arrays:w(*), DIMENSION at least max(1, n).The first m elements of w contain the selected eigenvaluesin ascending order.

1076

z(ldz,*).The second dimension of z must be at least max(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).The eigenvectors are normalized as follows:if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then z is not referenced.If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


work(1)


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith argument had an illegal value.If info > 0, spotrf/dpotrf and ssyevx/dsyevx returnedan error code:

If info = i ≤ n, ssyevx/dsyevx failed to converge, andi eigenvectors failed to converge. Their indices are storedin the array ifail;


1077




Specific details for the routine sygvx interface are the following:





z









Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omitted.Note that there will be an error condition if ifail is present and z isomitted.


1078


Application Notes



If abstol is less than or equal to zero, then ε*||T||1 is be used in its place, where T is thetridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computedmost accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.


For optimum performance use lwork ≥ (nb+3)*n, where nb is the blocksize for ssytrd/dsytrdreturned by ilaenv.





1079


?hegvxComputes selected eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem.

Syntax

Fortran 77:

call chegvx(itype, jobz, range, uplo, n, a, lda, b, ldb, vl, vu, il, iu,abstol, m, w, z, ldz, work, lwork, rwork, iwork, ifail, info)

call zhegvx(itype, jobz, range, uplo, n, a, lda, b, ldb, vl, vu, il, iu,abstol, m, w, z, ldz, work, lwork, rwork, iwork, ifail, info)

Fortran 95:

call hegvx(a, b, w [,itype] [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a complexgeneralized Hermitian-definite eigenproblem, of the form


Here A and B are assumed to be Hermitian and B is also positive definite. Eigenvalues andeigenvectors can be selected by specifying either a range of values or a range of indices forthe desired eigenvalues.

Input Parameters


itype



CHARACTER*1. Must be 'A' or 'V' or 'I'.range

1080


If range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalueslambda(i) in the half-open interval:




COMPLEX for chegvxa, b, workDOUBLE COMPLEX for zhegvx.Arrays:a(lda,*) contains the upper or lower triangle of theHermitian matrix A, as specified by uplo.The second dimension of a must be at least max(1, n).b(ldb,*) contains the upper or lower triangle of theHermitian positive definite matrix B, as specified by uplo.The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).



REAL for chegvxvl, vuDOUBLE PRECISION for zhegvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.


Constraint: 1 ≤ il ≤ iu ≤ n, if n > 0; il=1 and iu=0if n = 0.

1081


REAL for chegvxabstolDOUBLE PRECISION for zhegvx.The absolute error tolerance for the eigenvalues. SeeApplication Notes for more information.


ldz


INTEGER.lwork

The dimension of the array work; lwork ≥ max(1, 2n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.See Application Notes for the suggested value of lwork.

REAL for chegvxrworkDOUBLE PRECISION for zhegvx.Workspace array, DIMENSION at least max(1, 7n).


Output Parameters

On exit, the upper triangle (if uplo = 'U') or the lowertriangle (if uplo = 'L') of A, including the diagonal, isoverwritten.

a


b



REAL for chegvxwDOUBLE PRECISION for zhegvx.Array, DIMENSION at least max(1, n).

1082


The first m elements of w contain the selected eigenvaluesin ascending order.

COMPLEX for chegvxzDOUBLE COMPLEX for zhegvx.Array z(ldz,*). The second dimension of z must be at leastmax(1, m).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).The eigenvectors are normalized as follows:if itype = 1 or 2, ZH*B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then z is not referenced.If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


work(1)


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith argument had an illegal value.If info > 0, cpotrf/zpotrf and cheevx/zheevx returnedan error code:

If info = i ≤ n, cheevx/zheevx failed to converge, andi eigenvectors failed to converge. Their indices are storedin the array ifail;

1083





Specific details for the routine hegvx interface are the following:





z










Restored based on the presence of arguments vl, vu, il, iu as follows:rangerange = 'V', if one of or both vl and vu are present,range = 'I', if one of or both il and iu are present,range = 'A', if none of vl, vu, il, iu is present,

1084


Note that there will be an error condition if one of or both vl and vuare present and at the same time one of or both il and iu are present.

Application Notes





For optimum performance use lwork ≥ (nb+1)*n, where nb is the blocksize for chetrd/zhetrdreturned by ilaenv.





1085


?spgvComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with matrices in packedstorage.

Syntax

Fortran 77:

call sspgv(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, info)

call dspgv(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, info)

Fortran 95:

call spgv(a, b, w [,itype] [,uplo] [,z] [,info])

Description



Here A and B are assumed to be symmetric, stored in packed format, and B is also positivedefinite.

Input Parameters


itype



CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays ap and bp store the upper trianglesof A and B;

1086


If uplo = 'L', arrays ap and bp store the lower trianglesof A and B.


REAL for sspgvap, bp, workDOUBLE PRECISION for dspgv.Arrays:ap(*) contains the packed upper or lower triangle of thesymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of thesymmetric matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,3n).


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

Output Parameters

On exit, the contents of ap are overwritten.ap

On exit, contains the triangular factor U or L from theCholesky factorization B = UT*U or B = L*LT, in the samestorage format as B.

bp

REAL for sspgvw, zDOUBLE PRECISION for dspgv.Arrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors. The eigenvectors are normalized as follows:if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then z is not referenced.

INTEGER.info

1087


If info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, spptrf/dpptrf and sspev/dspev returnedan error code:

If info = i ≤ n, sspev/dspev failed to converge, and i





Specific details for the routine spgv interface are the following:


a


b






1088


?hpgvComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem with matrices in packedstorage.

Syntax

Fortran 77:

call chpgv(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, rwork, info)

call zhpgv(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, rwork, info)

Fortran 95:

call hpgv(a, b, w [,itype] [,uplo] [,z] [,info])

Description



Here A and B are assumed to be Hermitian, stored in packed format, and B is also positivedefinite.

Input Parameters


itype



CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays ap and bp store the upper trianglesof A and B;

1089


If uplo = 'L', arrays ap and bp store the lower trianglesof A and B.


COMPLEX for chpgvap, bp, workDOUBLE COMPLEX for zhpgv.Arrays:ap(*) contains the packed upper or lower triangle of theHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of theHermitian matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,2n-1).


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

REAL for chpgvrworkDOUBLE PRECISION for zhpgv.Workspace array, DIMENSION at least max(1, 3n-2).

Output Parameters


On exit, contains the triangular factor U or L from theCholesky factorization B = UH*U or B = L*LH, in the samestorage format as B.

bp

REAL for chpgvwDOUBLE PRECISION for zhpgv.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chpgvzDOUBLE COMPLEX for zhpgv.Array z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors. The eigenvectors are normalized as follows:

1090


if itype = 1 or 2, ZH*B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then z is not referenced.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, cpptrf/zpptrf and chpev/zhpev returnedan error code:

If info = i ≤ n, chpev/zhpev failed to converge, and i





Specific details for the routine hpgv interface are the following:


a


b






1091


?spgvdComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with matrices in packedstorage. If eigenvectors are desired, it uses a divideand conquer method.

Syntax

Fortran 77:

call sspgvd(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, lwork, iwork,liwork, info)

call dspgvd(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, lwork, iwork,liwork, info)

Fortran 95:

call spgvd(a, b, w [,itype] [,uplo] [,z] [,info])

Description



Here A and B are assumed to be symmetric, stored in packed format, and B is also positivedefinite.


Input Parameters


itype



1092


CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays ap and bp store the upper trianglesof A and B;If uplo = 'L', arrays ap and bp store the lower trianglesof A and B.


REAL for sspgvdap, bp, workDOUBLE PRECISION for dspgvd.Arrays:ap(*) contains the packed upper or lower triangle of thesymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of thesymmetric matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work is a workspace array, its dimension max(1, lwork).


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz



If jobz = 'N' and n>1, lwork ≥ 2n;

If jobz = 'V' and n>1, lwork ≥ 2n2+6n+1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

INTEGER.iworkWorkspace array, its dimension max(1, lwork).


1093




If jobz = 'V' and n>1, liwork ≥ 5n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the required sizes of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters



bp

REAL for sspgvw, zDOUBLE PRECISION for dspgv.Arrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors. The eigenvectors are normalized as follows:if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then z is not referenced.


work(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.

1094


If info > 0, spptrf/dpptrf and sspevd/dspevd returnedan error code:

If info = i ≤ n, sspevd/dspevd failed to converge, andi off-diagonal elements of an intermediate tridiagonal didnot converge to zero;




Specific details for the routine spgvd interface are the following:


a


b






Application Notes


1095





Application Notes

?hpgvdComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem with matrices in packedstorage. If eigenvectors are desired, it uses a divideand conquer method.

Syntax

Fortran 77:

call chpgvd(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, lwork, rwork,lrwork, iwork, liwork, info)

call zhpgvd(itype, jobz, uplo, n, ap, bp, w, z, ldz, work, lwork, rwork,lrwork, iwork, liwork, info)

Fortran 95:

call hpgvd(a, b, w [,itype] [,uplo] [,z] [,info])

Description



1096


Here A and B are assumed to be Hermitian, stored in packed format, and B is also positivedefinite.


Input Parameters


itype





COMPLEX for chpgvdap, bp, workDOUBLE COMPLEX for zhpgvd.Arrays:ap(*) contains the packed upper or lower triangle of theHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of theHermitian matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work is a workspace array, its dimension max(1, lwork).


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz



1097


If jobz = 'N' and n>1, lwork ≥ n;

If jobz = 'V' and n>1, lwork ≥ 2n.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for chpgvdrworkDOUBLE PRECISION for zhpgvd.Workspace array, its dimension max(1, lrwork).



If jobz = 'N' and n>1, lrwork ≥ n;

If jobz = 'V' and n>1, lrwork ≥ 2n2+5n+1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.





If jobz = 'V' and n>1, liwork ≥ 5n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entries

1098


of the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

Output Parameters



bp

REAL for chpgvdwDOUBLE PRECISION for zhpgvd.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chpgvdzDOUBLE COMPLEX for zhpgvd.Array z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors. The eigenvectors are normalized as follows:if itype = 1 or 2, ZH*B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then z is not referenced.


work(1)


rwork(1)


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, cpptrf/zpptrf and chpevd/zhpevd returnedan error code:

1099


If info = i ≤ n, chpevd/zhpevd failed to converge, andi off-diagonal elements of an intermediate tridiagonal didnot converge to zero;




Specific details for the routine hpgvd interface are the following:


a


b






Application Notes



1100




?spgvxComputes selected eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with matrices in packedstorage.

Syntax

Fortran 77:

call sspgvx(itype, jobz, range, uplo, n, ap, bp, vl, vu, il, iu, abstol, m,w, z, ldz, work, iwork, ifail, info)

call dspgvx(itype, jobz, range, uplo, n, ap, bp, vl, vu, il, iu, abstol, m,w, z, ldz, work, iwork, ifail, info)

Fortran 95:

call spgvx(a, b, w [,itype] [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalizedsymmetric-definite eigenproblem, of the form

Ax = λBx, ABx = λx, or B Ax = λx.

Here A and B are assumed to be symmetric, stored in packed format, and B is also positivedefinite. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

1101


Input Parameters


itype







REAL for sspgvxap, bp, workDOUBLE PRECISION for dspgvx.Arrays:ap(*) contains the packed upper or lower triangle of thesymmetric matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of thesymmetric matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,8n).

REAL for sspgvxvl, vuDOUBLE PRECISION for dspgvx.

1102


If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for sspgvxabstolDOUBLE PRECISION for dspgvx.The absolute error tolerance for the eigenvalues. SeeApplication Notes for more information.


ldz



Output Parameters



bp



REAL for sspgvxw, zDOUBLE PRECISION for dspgvx.Arrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.z(ldz,*).The second dimension of z must be at least max(1, n).

1103

If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).The eigenvectors are normalized as follows:if itype = 1 or 2, ZT*B*Z = I;if itype = 3, ZT*inv(B)*Z = I;If jobz = 'N', then z is not referenced.If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, spptrf/dpptrf and sspevx/dspevx returnedan error code:

If info = i ≤ n, sspevx/dspevx failed to converge, andi eigenvectors failed to converge. Their indices are storedin the array ifail;


1104




Specific details for the routine spgvx interface are the following:


a


b



z











1105


Application Notes





?hpgvxComputes selected eigenvalues and, optionally,eigenvectors of a generalized Hermitian definiteeigenproblem with matrices in packed storage.

Syntax

Fortran 77:

call chpgvx(itype, jobz, range, uplo, n, ap, bp, vl, vu, il, iu, abstol, m,w, z, ldz, work, rwork, iwork, ifail, info)

call zhpgvx(itype, jobz, range, uplo, n, ap, bp, vl, vu, il, iu, abstol, m,w, z, ldz, work, rwork, iwork, ifail, info)

Fortran 95:

call hpgvx(a, b, w [,itype] [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a complexgeneralized Hermitian-definite eigenproblem, of the form

Ax = λBx, ABx = λx, or B Ax = λx.

Here A and B are assumed to be Hermitian, stored in packed format, and B is also positivedefinite. Eigenvalues and eigenvectors can be selected by specifying either a range of valuesor a range of indices for the desired eigenvalues.

1106


Input Parameters


itype







COMPLEX for chpgvxap, bp, workDOUBLE COMPLEX for zhpgvx.Arrays:ap(*) contains the packed upper or lower triangle of theHermitian matrix A, as specified by uplo.The dimension of ap must be at least max(1, n*(n+1)/2).bp(*) contains the packed upper or lower triangle of theHermitian matrix B, as specified by uplo.The dimension of bp must be at least max(1, n*(n+1)/2).work(*) is a workspace array, DIMENSION at least max(1,2n).

REAL for chpgvxvl, vuDOUBLE PRECISION for zhpgvx.

1107


If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for chpgvxabstolDOUBLE PRECISION for zhpgvx.The absolute error tolerance for the eigenvalues.See Application Notes for more information.


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

REAL for chpgvxrworkDOUBLE PRECISION for zhpgvx.Workspace array, DIMENSION at least max(1, 7n).


Output Parameters



bp



REAL for chpgvxwDOUBLE PRECISION for zhpgvx.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

1108

COMPLEX for chpgvxzDOUBLE COMPLEX for zhpgvx.Array z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, the first m columns of zcontain the orthonormal eigenvectors of the matrix Acorresponding to the selected eigenvalues, with the i-thcolumn of z holding the eigenvector associated with w(i).The eigenvectors are normalized as follows:if itype = 1 or 2, ZH*B*Z = I;if itype = 3, ZH*inv(B)*Z = I;If jobz = 'N', then z is not referenced.If an eigenvector fails to converge, then that column of zcontains the latest approximation to the eigenvector, andthe index of the eigenvector is returned in ifail.Note: you must ensure that at least max(1,m) columns aresupplied in the array z; if range = 'V', the exact value ofm is not known in advance and an upper bound must beused.


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, cpptrf/zpptrf and chpevx/zhpevx returnedan error code:

If info = i ≤ n, chpevx/zhpevx failed to converge, andi eigenvectors failed to converge. Their indices are storedin the array ifail;


1109




Specific details for the routine hpgvx interface are the following:


a


b



z











1110


Application Notes





?sbgvComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with banded matrices.

Syntax

Fortran 77:

call ssbgv(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, info)

call dsbgv(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, info)

Fortran 95:

call sbgv(a, b, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized

symmetric-definite banded eigenproblem, of the form Ax = λBx. Here A and B are assumedto be symmetric and banded, and B is also positive definite.

Input Parameters



1111


If uplo = 'U', arrays ab and bb store the upper trianglesof A and B;If uplo = 'L', arrays ab and bb store the lower trianglesof A and B.



(ka ≥ 0).

INTEGER. The number of super- or sub-diagonals in B (kb

≥ 0).

kb

REAL for ssbgvab, bb, workDOUBLE PRECISION for dsbgvArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the symmetric matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).work(*) is a workspace array, dimension at least max(1,3n)


ldab


ldbb


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

Output Parameters

On exit, the contents of ab are overwritten.ab

1112


On exit, contains the factor S from the split Choleskyfactorization B = ST*S, as returned by spbstf/dpbstf.

bb

REAL for ssbgvw, zDOUBLE PRECISION for dsbgvArrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors, with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZT*B*Z = I.If jobz = 'N', then z is not referenced.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th argument had an illegal value.If info > 0, and

if i ≤ n, the algorithm failed to converge, and i off-diagonalelements of an intermediate tridiagonal did not converge tozero;

if info = n + i, for 1 ≤ i ≤ n, then spbstf/dpbstfreturned info = i and B is not positive-definite. Thefactorization of B could not be completed and no eigenvaluesor eigenvectors were computed.



Specific details for the routine sbgv interface are the following:


a


b

1113






?hbgvComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem with banded matrices.

Syntax

Fortran 77:

call chbgv(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, rwork,info)

call zhbgv(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, rwork,info)

Fortran 95:

call hbgv(a, b, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a complex

generalized Hermitian-definite banded eigenproblem, of the form Ax = λBx. Here A and B areassumed to be Hermitian and banded, and B is also positive definite.

Input Parameters


CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays ab and bb store the upper trianglesof A and B;

1114


If uplo = 'L', arrays ab and bb store the lower trianglesof A and B.



(ka ≥ 0).


≥ 0).

kb

COMPLEX for chbgvab, bb, workDOUBLE COMPLEX for zhbgvArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the Hermitian matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).work(*) is a workspace array, dimension at least max(1,n).


ldab


ldbb


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

REAL for chbgvrworkDOUBLE PRECISION for zhbgv.Workspace array, DIMENSION at least max(1, 3n).

Output Parameters


1115


On exit, contains the factor S from the split Choleskyfactorization B = SH*S, as returned by cpbstf/zpbstf.

bb

REAL for chbgvwDOUBLE PRECISION for zhbgv.Array, DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chbgvzDOUBLE COMPLEX for zhbgvArray z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors, with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZH*B*Z = I.If jobz = 'N', then z is not referenced.



if info = n + i, for 1 ≤ i ≤ n, then cpbstf/zpbstfreturned info = i and B is not positive-definite. Thefactorization of B could not be completed and no eigenvaluesor eigenvectors were computed.



Specific details for the routine hbgv interface are the following:


a

1116



b





?sbgvdComputes all eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with banded matrices. Ifeigenvectors are desired, it uses a divide andconquer method.

Syntax

Fortran 77:

call ssbgvd(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, lwork,iwork, liwork, info)

call dsbgvd(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, lwork,iwork, liwork, info)

Fortran 95:

call sbgvd(a, b, w [,uplo] [,z] [,info])

Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized

symmetric-definite banded eigenproblem, of the form Ax = λBx. Here A and B are assumedto be symmetric and banded, and B is also positive definite.


Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobz

1117


If jobz = 'N', then compute eigenvalues only.If jobz = 'V', then compute eigenvalues and eigenvectors.

CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays ab and bb store the upper trianglesof A and B;If uplo = 'L', arrays ab and bb store the lower trianglesof A and B.



(ka ≥ 0).


≥ 0).

kb

REAL for ssbgvdab, bb, workDOUBLE PRECISION for dsbgvdArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the symmetric matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).work is a workspace array, its dimension max(1, lwork).


ldab


ldbb


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz


1118



If jobz = 'N' and n>1, lwork ≥ 3n;

If jobz = 'V' and n>1, lwork ≥ 2n2+5n+1.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.





If jobz = 'V' and n>1, liwork ≥ 5n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work andiwork arrays, returns these values as the first entries of thework and iwork arrays, and no error message related tolwork or liwork is issued by xerbla. See Application Notesfor details.

Output Parameters



bb

REAL for ssbgvdw, zDOUBLE PRECISION for dsbgvdArrays:w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.z(ldz,*).The second dimension of z must be at least max(1, n).

1119


If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors, with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZT*B*Z = I.If jobz = 'N', then z is not referenced.


work(1)


iwork(1)






Specific details for the routine sbgvd interface are the following:


a


b





1120



Application Notes





?hbgvdComputes all eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem with banded matrices. Ifeigenvectors are desired, it uses a divide andconquer method.

Syntax

Fortran 77:

call chbgvd(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, lwork,rwork, lrwork, iwork, liwork, info)

call zhbgvd(jobz, uplo, n, ka, kb, ab, ldab, bb, ldbb, w, z, ldz, work, lwork,rwork, lrwork, iwork, liwork, info)

Fortran 95:

call hbgvd(a, b, w [,uplo] [,z] [,info])

1121


Description

This routine computes all the eigenvalues, and optionally, the eigenvectors of a complex

generalized Hermitian-definite banded eigenproblem, of the form Ax = λBx. Here A and B areassumed to be Hermitian and banded, and B is also positive definite.


Input Parameters





(ka ≥ 0).


≥ 0).

kb

COMPLEX for chbgvdab, bb, workDOUBLE COMPLEX for zhbgvdArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the Hermitian matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).work is a workspace array, its dimension max(1, lwork).

1122


ldab


ldbb


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz



If jobz = 'N' and n>1, lwork ≥ n;If jobz = 'V' and n>1, lwork < 2n2.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

REAL for chbgvdrworkDOUBLE PRECISION for zhbgvd.Workspace array, DIMENSION max(1, lrwork).



If jobz = 'N' and n>1, lrwork ≥ n;

If jobz = 'V' and n>1, lrwork ≥ 2n2+5n +1.If lrwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.


1123

If n ≤ 1, liwork < 1;


If jobz = 'V' and n>1, liwork ≥ 5n+3.If liwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work, rworkand iwork arrays, returns these values as the first entriesof the work, rwork and iwork arrays, and no error messagerelated to lwork or lrwork or liwork is issued by xerbla.See Application Notes for details.

Output Parameters



bb

REAL for chbgvdwDOUBLE PRECISION for zhbgvd.Array, DIMENSION at least max(1, n) .If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chbgvdzDOUBLE COMPLEX for zhbgvdArray z(ldz,*) .The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors , with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZH*B*Z = I.If jobz = 'N', then z is not referenced.


work(1)


rwork(1)

1124


iwork(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith argument had an illegal value.If info > 0, and





Specific details for the routine hbgvd interface are the following:


a


b





Application Notes


1125





?sbgvxComputes selected eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem with banded matrices.

Syntax

Fortran 77:

call ssbgvx(jobz, range, uplo, n, ka, kb, ab, ldab, bb, ldbb, q, ldq, vl, vu,il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info)

call dsbgvx(jobz, range, uplo, n, ka, kb, ab, ldab, bb, ldbb, q, ldq, vl, vu,il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info)

Fortran 95:

call sbgvx(a, b, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,q][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized

symmetric-definite banded eigenproblem, of the form Ax = λBx. Here A and B are assumedto be symmetric and banded, and B is also positive definite. Eigenvalues and eigenvectors canbe selected by specifying either all eigenvalues, a range of values or a range of indices for thedesired eigenvalues.

1126


Input Parameters







(ka ≥ 0).


≥ 0).

kb

REAL for ssbgvxab, bb, workDOUBLE PRECISION for dsbgvxArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the symmetric matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the symmetric matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).work(*) is a workspace array, DIMENSION (7*n).

1127


ldab


ldbb

REAL for ssbgvxvl, vuDOUBLE PRECISION for dsbgvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for ssbgvxabstolDOUBLE PRECISION for dsbgvx.The absolute error tolerance for the eigenvalues. SeeApplication Notes for more information.


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

INTEGER. The leading dimension of the output array q; ldq< 1.

ldq

If jobz = 'V', ldq < max(1, n).

INTEGER.iworkWorkspace array, DIMENSION (5*n).

Output Parameters



bb


0 ≤ m ≤ n. If range = 'A', m = n, and if range = 'I',

1128

m = iu-il+1.

REAL for ssbgvxw, z, qDOUBLE PRECISION for dsbgvxArrays:w(*), DIMENSION at least max(1, n) .If info = 0, contains the eigenvalues in ascending order.z(ldz,*) .The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors , with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZT*B*Z = I.If jobz = 'N', then z is not referenced.q(ldq,*) .The second dimension of q must be at least max(1, n).If jobz = 'V', then q contains the n-by-n matrix used inthe reduction of Ax = lambda*Bx to standard form, thatis, Cx = lambda*x and consequently C to tridiagonal form.If jobz = 'N', then q is not referenced.

INTEGER.ifailArray, DIMENSION (m).If jobz = 'V', then if info = 0, the first m elements ofifail are zero; if info > 0, the ifail contains the indicesof the eigenvectors that failed to converge.If jobz = 'N', then ifail is not referenced.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the ith argument had an illegal value.If info > 0, and



1129




Specific details for the routine sbgvx interface are the following:


a


b











Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omitted.Note that there will be an error condition if ifail or q is present andz is omitted.


1130


Application Notes





?hbgvxComputes selected eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem with banded matrices.

Syntax

Fortran 77:

call chbgvx(jobz, range, uplo, n, ka, kb, ab, ldab, bb, ldbb, q, ldq, vl, vu,il, iu, abstol, m, w, z, ldz, work, rwork, iwork, ifail, info)

call zhbgvx(jobz, range, uplo, n, ka, kb, ab, ldab, bb, ldbb, q, ldq, vl, vu,il, iu, abstol, m, w, z, ldz, work, rwork, iwork, ifail, info)

Fortran 95:

call hbgvx(a, b, w [,uplo] [,z] [,vl] [,vu] [,il] [,iu] [,m] [,ifail] [,q][,abstol] [,info])

Description

This routine computes selected eigenvalues, and optionally, the eigenvectors of a complex

generalized Hermitian-definite banded eigenproblem, of the form Ax = λBx. Here A and B areassumed to be Hermitian and banded, and B is also positive definite. Eigenvalues andeigenvectors can be selected by specifying either all eigenvalues, a range of values or a rangeof indices for the desired eigenvalues.

1131


Input Parameters







(ka ≥ 0).


≥ 0).

kb

COMPLEX for chbgvxab, bb, workDOUBLE COMPLEX for zhbgvxArrays:ab (ldab,*) is an array containing either upper or lowertriangular part of the Hermitian matrix A (as specified byuplo) in band storage format.The second dimension of the array ab must be at leastmax(1, n).bb(ldbb,*) is an array containing either upper or lowertriangular part of the Hermitian matrix B (as specified byuplo) in band storage format.The second dimension of the array bb must be at leastmax(1, n).

1132


work(*) is a workspace array, DIMENSION at least max(1,n).


ldab


ldbb

REAL for chbgvxvl, vuDOUBLE PRECISION for zhbgvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.Constraint: vl< vu.If range = 'A' or 'I', vl and vu are not referenced.



REAL for chbgvxabstolDOUBLE PRECISION for zhbgvx.The absolute error tolerance for the eigenvalues. SeeApplication Notes for more information.


≥ 1. If jobz = 'V', ldz ≥ max(1, n).

ldz

INTEGER. The leading dimension of the output array q; ldq

≥ 1. If jobz = 'V', ldq ≥ max(1, n).

ldq

REAL for chbgvxrworkDOUBLE PRECISION for zhbgvx.Workspace array, DIMENSION at least max(1, 7n).


Output Parameters


1133


bb



REAL for chbgvxwDOUBLE PRECISION for zhbgvx.Array w(*), DIMENSION at least max(1, n).If info = 0, contains the eigenvalues in ascending order.

COMPLEX for chbgvxz, qDOUBLE COMPLEX for zhbgvxArrays:z(ldz,*).The second dimension of z must be at least max(1, n).If jobz = 'V', then if info = 0, z contains the matrix Zof eigenvectors, with the i-th column of z holding theeigenvector associated with w(i). The eigenvectors arenormalized so that ZH*B*Z = I.If jobz = 'N', then z is not referenced.q(ldq,*).The second dimension of q must be at least max(1, n).If jobz = 'V', then q contains the n-by-n matrix used in

the reduction of Ax = λBx to standard form, that is, Cx =

λ x and consequently C to tridiagonal form.If jobz = 'N', then q is not referenced.



1134






Specific details for the routine hbgvx interface are the following:


a


b











Restored based on the presence of the argument z as follows:jobzjobz = 'V', if z is present,jobz = 'N', if z is omitted.Note that there will be an error condition if ifail or q is present andz is omitted.

Restored based on the presence of arguments vl, vu, il, iu as follows:range

1135


range = 'V', if one of or both vl and vu are present,range = 'I', if one of or both il and iu are present,range = 'A', if none of vl, vu, il, iu is present,Note that there will be an error condition if one of or both vl and vuare present and at the same time one of or both il and iu are present.

Application Notes





Generalized Nonsymmetric Eigenproblems

This section describes LAPACK driver routines used for solving generalized nonsymmetriceigenproblems. See also computational routines computational routines that can be called tosolve these problems. Table 4-14 lists all such driver routines for Fortran-77 interface.Respective routine names in Fortran-95 interface are without the first symbol (see RoutineNaming Conventions).

Table 4-14 Driver Routines for Solving Generalized Nonsymmetric Eigenproblems


Computes the generalized eigenvalues, Schur form, and the left and/orright Schur vectors for a pair of nonsymmetric matrices.

?gges

Computes the generalized eigenvalues, Schur form, and, optionally, theleft and/or right matrices of Schur vectors.

?ggesx

Computes the generalized eigenvalues, and the left and/or right generalizedeigenvectors for a pair of nonsymmetric matrices.

?ggev

Computes the generalized eigenvalues, and, optionally, the left and/orright generalized eigenvectors.

?ggevx

1136


?ggesComputes the generalized eigenvalues, Schur form,and the left and/or right Schur vectors for a pairof nonsymmetric matrices.

Syntax

Fortran 77:

call sgges(jobvsl, jobvsr, sort, selctg, n, a, lda, b, ldb, sdim, alphar,alphai, beta, vsl, ldvsl, vsr, ldvsr, work, lwork, bwork, info)

call dgges(jobvsl, jobvsr, sort, selctg, n, a, lda, b, ldb, sdim, alphar,alphai, beta, vsl, ldvsl, vsr, ldvsr, work, lwork, bwork, info)

call cgges(jobvsl, jobvsr, sort, selctg, n, a, lda, b, ldb, sdim, alpha, beta,vsl, ldvsl, vsr, ldvsr, work, lwork, rwork, bwork, info)

call zgges(jobvsl, jobvsr, sort, selctg, n, a, lda, b, ldb, sdim, alpha, beta,vsl, ldvsl, vsr, ldvsr, work, lwork, rwork, bwork, info)

Fortran 95:

call gges(a, b, alphar, alphai, beta [,vsl] [,vsr] [,select] [,sdim] [,info])

call gges(a, b, alpha, beta [, vsl] [,vsr] [,select] [,sdim] [,info])

Description

This routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), thegeneralized eigenvalues, the generalized real/complex Schur form (S,T), optionally, the leftand/or right matrices of Schur vectors (vsl and vsr). This gives the generalized Schurfactorization

(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )

Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears inthe leading diagonal blocks of the upper quasi-triangular matrix S and the upper triangularmatrix T. The leading columns of vsl and vsr then form an orthonormal/unitary basis for thecorresponding left and right eigenspaces (deflating subspaces).

(If only the generalized eigenvalues are needed, use the driver ?ggev instead, which is faster.)

1137


A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w,such that A - w*B is singular. It is usually represented as the pair (alpha, beta), as thereis a reasonable interpretation for beta=0 or for both being zero. A pair of matrices (S,T) is ingeneralized real Schur form if T is upper triangular with non-negative diagonal and S is blockupper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1 blocks correspond to real generalizedeigenvalues, while 2-by-2 blocks of S will be “standardized" by making the correspondingelements of T have the form:

and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair ofgeneralized eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S andT are upper triangular and, in addition, the diagonal of T are non-negative real numbers.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvslIf jobvsl = 'N', then the left Schur vectors are notcomputed.If jobvsl = 'V', then the left Schur vectors are computed.

CHARACTER*1. Must be 'N' or 'V'.jobvsrIf jobvsr = 'N', then the right Schur vectors are notcomputed.If jobvsr = 'V', then the right Schur vectors arecomputed.

CHARACTER*1. Must be 'N' or 'S'. Specifies whether ornot to order the eigenvalues on the diagonal of thegeneralized Schur form.

sort

If sort = 'N', then eigenvalues are not ordered.If sort = 'S', eigenvalues are ordered (see selctg).

LOGICAL FUNCTION of three REAL arguments for realflavors.

selctg

LOGICAL FUNCTION of two COMPLEX arguments for complexflavors.selctg must be declared EXTERNAL in the calling subroutine.

1138


If sort = 'S', selctg is used to select eigenvalues to sortto the top left of the Schur form.If sort = 'N', selctg is not referenced.For real flavors:An eigenvalue (alphar(j) + alphai(j))/beta(j) is selectedif selctg(alphar(j), alphai(j), beta(j)) is true; that is, ifeither one of a complex conjugate pair of eigenvalues isselected, then both complex eigenvalues are selected.Note that in the ill-conditioned case, a selected complexeigenvalue may no longer satisfy selctg(alphar(j),alphai(j), beta(j)) = .TRUE. after ordering. In thiscase info is set to n+2 .For complex flavors:An eigenvalue alpha(j) / beta(j) is selected ifselctg(alpha(j), beta(j)) is true.Note that a selected complex eigenvalue may no longersatisfy selctg(alpha(j), beta(j)) = .TRUE. after ordering,since ordering may change the value of complex eigenvalues(especially if the eigenvalue is ill-conditioned); in this caseinfo is set to n+2 (see info below).

INTEGER. The order of the matrices A, B, vsl, and vsr (n

≥ 0).

n

REAL for sggesa, b, workDOUBLE PRECISION for dggesCOMPLEX for cggesDOUBLE COMPLEX for zgges.Arrays:a(lda,*) is an array containing the n-by-n matrix A (first ofthe pair of matrices).The second dimension of a must be at least max(1, n).b(ldb,*) is an array containing the n-by-n matrix B (secondof the pair of matrices).The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

1139


INTEGER. The first dimension of the array b. Must be atleast max(1, n).

ldb

INTEGER. The first dimensions of the output matrices vsland vsr, respectively. Constraints:

ldvsl, ldvsr

ldvsl ≥ 1. If jobvsl = 'V', ldvsl ≥ max(1, n).

ldvsr ≥ 1. If jobvsr = 'V', ldvsr ≥ max(1, n).


lwork ≥ max(1, 8n+16) for real flavors;

lwork ≥ max(1, 2n) for complex flavors.For good performance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cggesrworkDOUBLE PRECISION for zggesWorkspace array, DIMENSION at least max(1, 8n).This array is used in complex flavors only.

LOGICAL.bworkWorkspace array, DIMENSION at least max(1, n).Not referenced if sort = 'N'.

Output Parameters

On exit, this array has been overwritten by its generalizedSchur form S.

a

On exit, this array has been overwritten by its generalizedSchur form T.

b

INTEGER.sdimIf sort = 'N', sdim= 0.If sort = 'S', sdim is equal to the number of eigenvalues(after sorting) for which selctg is true.Note that for real flavors complex conjugate pairs for whichselctg is true for either eigenvalue count as 2.

REAL for sgges;alphar, alphai

1140


DOUBLE PRECISION for dgges.Arrays, DIMENSION at least max(1, n) each. Contain valuesthat form generalized eigenvalues in real flavors.See beta.

COMPLEX for cgges;alphaDOUBLE COMPLEX for zgges.Array, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in complex flavors. See beta.

REAL for sggesbetaDOUBLE PRECISION for dggesCOMPLEX for cggesDOUBLE COMPLEX for zgges.Array, DIMENSION at least max(1, n).For real flavors:On exit, (alphar(j) + alphai(j)*i)/beta(j), j=1,..., n, willbe the generalized eigenvalues.alphar(j) + alphai(j)*i and beta(j), j=1,..., n are thediagonals of the complex Schur form (S,T) that would resultif the 2-by-2 diagonal blocks of the real generalized Schurform of (A,B) were further reduced to triangular form usingcomplex unitary transformations. If alphai(j) is zero, thenthe j-th eigenvalue is real; if positive, then the j-th and(j+1)-st eigenvalues are a complex conjugate pair, withalphai(j+1) negative.For complex flavors:On exit, alpha(j)/beta(j), j=1,..., n, will be the generalizedeigenvalues. alpha(j), j=1,...,n, and beta(j), j=1,..., n arethe diagonals of the complex Schur form (S,T) output bycgges/zgges. The beta(j) will be non-negative real.See also Application Notes below.

REAL for sggesvsl, vsrDOUBLE PRECISION for dggesCOMPLEX for cggesDOUBLE COMPLEX for zgges.Arrays:vsl(ldvsl,*), the second dimension of vsl must be at leastmax(1, n).

1141


If jobvsl = 'V', this array will contain the left Schurvectors.If jobvsl = 'N', vsl is not referenced.vsr(ldvsr,*), the second dimension of vsr must be at leastmax(1, n).If jobvsr = 'V', this array will contain the right Schurvectors.If jobvsr = 'N', vsr is not referenced.


work(1)


i ≤ n:the QZ iteration failed. (A, B) is not in Schur form, butalphar(j), alphai(j) (for real flavors), or alpha(j) (forcomplex flavors), and beta(j), j=info+1,..., n shouldbe correct.i > n: errors that usually indicate LAPACK problems:i = n+1: other than QZ iteration failed in ?hgeqz;i = n+2: after reordering, roundoff changed values of somecomplex eigenvalues so that leading eigenvalues in thegeneralized Schur form no longer satisfy selctg = .TRUE..This could also be caused due to scaling;i = n+3: reordering failed in ?tgsen.



Specific details for the routine gges interface are the following:





1142




Holds the matrix VSL of size (n, n).vsl

Holds the matrix VSR of size (n, n).vsr

Restored based on the presence of the argument vsl as follows:jobvsljobvsl = 'V', if vsl is present,jobvsl = 'N', if vsl is omitted.

Restored based on the presence of the argument vsr as follows:jobvsrjobvsr = 'V', if vsr is present,jobvsr = 'N', if vsr is omitted.


Application Notes





The quotients alphar(j)/beta(j) and alphai(j)/beta(j) may easily over- or underflow, andbeta(j) may even be zero. Thus, you should avoid simply computing the ratio. However, alpharand alphai will be always less than and usually comparable with norm(A) in magnitude, andbeta always less than and usually comparable with norm(B).

1143


?ggesxComputes the generalized eigenvalues, Schur form,and, optionally, the left and/or right matrices ofSchur vectors.

Syntax

Fortran 77:

call sggesx (jobvsl, jobvsr, sort, selctg, sense, n, a, lda, b, ldb, sdim,alphar, alphai, beta, vsl, ldvsl, vsr, ldvsr, rconde, rcondv, work, lwork,iwork, liwork, bwork, info)

call dggesx (jobvsl, jobvsr, sort, selctg, sense, n, a, lda, b, ldb, sdim,alphar, alphai, beta, vsl, ldvsl, vsr, ldvsr, rconde, rcondv, work, lwork,iwork, liwork, bwork, info)

call cggesx (jobvsl, jobvsr, sort, selctg, sense, n, a, lda, b, ldb, sdim,alpha, beta, vsl, ldvsl, vsr, ldvsr, rconde, rcondv, work, lwork, rwork,iwork, liwork, bwork, info)

call zggesx (jobvsl, jobvsr, sort, selctg, sense, n, a, lda, b, ldb, sdim,alpha, beta, vsl, ldvsl, vsr, ldvsr, rconde, rcondv, work, lwork, rwork,iwork, liwork, bwork, info)

Fortran 95:

call ggesx(a, b, alphar, alphai, beta [,vsl] [,vsr] [,select] [,sdim] [,rconde][, rcondv] [,info])

call ggesx(a, b, alpha, beta [, vsl] [,vsr] [,select] [,sdim] [,rconde][,rcondv] [, info])

Description

This routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), thegeneralized eigenvalues, the generalized real/complex Schur form (S,T), optionally, the leftand/or right matrices of Schur vectors (vsl and vsr). This gives the generalized Schurfactorization

(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )

1144


Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears inthe leading diagonal blocks of the upper quasi-triangular matrix S and the upper triangularmatrix T; computes a reciprocal condition number for the average of the selected eigenvalues(rconde); and computes a reciprocal condition number for the right and left deflating subspacescorresponding to the selected eigenvalues (rcondv). The leading columns of vsl and vsr thenform an orthonormal/unitary basis for the corresponding left and right eigenspaces (deflatingsubspaces).

A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta =w, such that A - w*B is singular. It is usually represented as the pair (alpha, beta), as thereis a reasonable interpretation for beta=0 or for both being zero. A pair of matrices (S,T) is ingeneralized real Schur form if T is upper triangular with non-negative diagonal and S is blockupper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1 blocks correspond to real generalizedeigenvalues, while 2-by-2 blocks of S will be “standardized" by making the correspondingelements of T have the form:

and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair ofgeneralized eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S andT are upper triangular and, in addition, the diagonal of T are non-negative real numbers.

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvslIf jobvsl = 'N', then the left Schur vectors are notcomputed.If jobvsl = 'V', then the left Schur vectors are computed.

CHARACTER*1. Must be 'N' or 'V'.jobvsrIf jobvsr = 'N', then the right Schur vectors are notcomputed.If jobvsr = 'V', then the right Schur vectors arecomputed.

1145


CHARACTER*1. Must be 'N' or 'S'. Specifies whether ornot to order the eigenvalues on the diagonal of thegeneralized Schur form.

sort

If sort = 'N', then eigenvalues are not ordered.If sort = 'S', eigenvalues are ordered (see selctg).

LOGICAL FUNCTION of three REAL arguments for realflavors.

selctg

LOGICAL FUNCTION of two COMPLEX arguments for complexflavors.selctg must be declared EXTERNAL in the calling subroutine.If sort = 'S', selctg is used to select eigenvalues to sortto the top left of the Schur form.If sort = 'N', selctg is not referenced.For real flavors:An eigenvalue (alphar(j) + alphai(j))/beta(j) is selectedif selctg(alphar(j), alphai(j), beta(j)) is true; that is, ifeither one of a complex conjugate pair of eigenvalues isselected, then both complex eigenvalues are selected.Note that in the ill-conditioned case, a selected complexeigenvalue may no longer satisfy selctg(alphar(j),alphai(j), beta(j)) = .TRUE. after ordering. In thiscase info is set to n+2.For complex flavors:An eigenvalue alpha(j) / beta(j) is selected ifselctg(alpha(j), beta(j)) is true.Note that a selected complex eigenvalue may no longersatisfy selctg(alpha(j), beta(j)) = .TRUE. afterordering, since ordering may change the value of complexeigenvalues (especially if the eigenvalue is ill-conditioned);in this case info is set to n+2 (see info below).


sense

If sense = 'N', none are computed;If sense = 'E', computed for average of selectedeigenvalues only;If sense = 'V', computed for selected deflating subspacesonly;If sense = 'B', computed for both.

1146


If sense is 'E', 'V', or 'B', then sort must equal 'S'.

INTEGER. The order of the matrices A, B, vsl, and vsr (n

≥ 0).

n

REAL for sggesxa, b, workDOUBLE PRECISION for dggesxCOMPLEX for cggesxDOUBLE COMPLEX for zggesx.Arrays:a(lda,*) is an array containing the n-by-n matrix A (first ofthe pair of matrices).The second dimension of a must be at least max(1, n).b(ldb,*) is an array containing the n-by-n matrix B (secondof the pair of matrices).The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER. The first dimension of the array b.ldbMust be at least max(1, n).

INTEGER. The first dimensions of the output matrices vsland vsr, respectively. Constraints:

ldvsl, ldvsr

ldvsl ≥ 1. If jobvsl = 'V', ldvsl ≥ max(1, n).

ldvsr ≥ 1. If jobvsr = 'V', ldvsr ≥ max(1, n).

INTEGER.lworkThe dimension of the array work.For real flavors:

If n=0 then lwork≥1.

If n>0 and sense = 'N', then lwork ≥ max(8*n,6*n+16).

If n>0 and sense = 'E', 'V', or 'B', then lwork ≥max(8*n, 6*n+16, 2*sdim*(n-sdim));For complex flavors:

If n=0 then lwork≥1.

If n>0 and sense = 'N', then lwork ≥ max(1, 2*n);

1147


If n>0 and sense = 'E', 'V', or 'B', then lwork ≥max(1, 2*n, 2*sdim*(n-sdim)).

Note that 2*sdim*(n-sdim) ≤ n*n/2.An error is only returned if lwork < max(8*n, 6*n+16)forreal flavors, and lwork < max(1, 2*n) for complex flavors,but if sense = 'E', 'V', or 'B', this may not be largeenough.If lwork=-1, then a workspace query is assumed; theroutine only calculates the bound on the optimal size of thework array and the minimum size of the iwork array,returns these values as the first entries of the work andiwork arrays, and no error message related to lwork orliwork is issued by xerbla.

REAL for cggesxrworkDOUBLE PRECISION for zggesxWorkspace array, DIMENSION at least max(1, 8n).This array is used in complex flavors only.



If sense = 'N', or n=0, then liwork≥1,

otherwise liwork ≥ (n+6) for real flavors, and liwork ≥(n+2) for complex flavors.If liwork=-1, then a workspace query is assumed; theroutine only calculates the bound on the optimal size of thework array and the minimum size of the iwork array,returns these values as the first entries of the work andiwork arrays, and no error message related to lwork orliwork is issued by xerbla.

LOGICAL.bworkWorkspace array, DIMENSION at least max(1, n).Not referenced if sort = 'N'.

1148

Output Parameters

On exit, this array has been overwritten by its generalizedSchur form S.

a

On exit, this array has been overwritten by its generalizedSchur form T.

b

INTEGER.sdimIf sort = 'N', sdim= 0.If sort = 'S', sdim is equal to the number of eigenvalues(after sorting) for which selctg is true.Note that for real flavors complex conjugate pairs for whichselctg is true for either eigenvalue count as 2.

REAL for sggesx;alphar, alphaiDOUBLE PRECISION for dggesx.Arrays, DIMENSION at least max(1, n) each. Contain valuesthat form generalized eigenvalues in real flavors.See beta.

COMPLEX for cggesx;alphaDOUBLE COMPLEX for zggesx.Array, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in complex flavors. See beta.

REAL for sggesxbetaDOUBLE PRECISION for dggesxCOMPLEX for cggesxDOUBLE COMPLEX for zggesx.Array, DIMENSION at least max(1, n).For real flavors:On exit, (alphar(j) + alphai(j)*i)/beta(j), j=1,..., n willbe the generalized eigenvalues.alphar(j) + alphai(j)*i and beta(j), j=1,..., n are thediagonals of the complex Schur form (S,T) that would resultif the 2-by-2 diagonal blocks of the real generalized Schurform of (A,B) were further reduced to triangular form usingcomplex unitary transformations. If alphai(j) is zero, thenthe j-th eigenvalue is real; if positive, then the j-th and(j+1)-st eigenvalues are a complex conjugate pair, withalphai(j+1) negative.For complex flavors:

1149


On exit, alpha(j)/beta(j), j=1,..., n will be the generalizedeigenvalues. alpha(j), j=1,..., n, and beta(j), j=1,...,n arethe diagonals of the complex Schur form (S,T) output bycggesx/zggesx. The beta(j) will be non-negative real.See also Application Notes below.

REAL for sggesxvsl, vsrDOUBLE PRECISION for dggesxCOMPLEX for cggesxDOUBLE COMPLEX for zggesx.Arrays:vsl(ldvsl,*), the second dimension of vsl must be at leastmax(1, n).If jobvsl = 'V', this array will contain the left Schurvectors.If jobvsl = 'N', vsl is not referenced.vsr(ldvsr,*), the second dimension of vsr must be at leastmax(1, n).If jobvsr = 'V', this array will contain the right Schurvectors.If jobvsr = 'N', vsr is not referenced.

REAL for single precision flavorsrconde, rcondvDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION (2) eachIf sense = 'E' or 'B', rconde(1) and rconde(2) containthe reciprocal condition numbers for the average of theselected eigenvalues.Not referenced if sense = 'N' or 'V'.If sense = 'V' or 'B', rcondv(1) and rcondv(2) containthe reciprocal condition numbers for the selected deflatingsubspaces.Not referenced if sense = 'N' or 'E'.


work(1)


iwork(1)


1150


If info = -i, the ith parameter had an illegal value.If info = i, and

i ≤ n:the QZ iteration failed. (A, B) is not in Schur form, butalphar(j), alphai(j) (for real flavors), or alpha(j) (forcomplex flavors), and beta(j), j=info+1,..., n shouldbe correct.i > n: errors that usually indicate LAPACK problems:i = n+1: other than QZ iteration failed in ?hgeqz;i = n+2: after reordering, roundoff changed values of somecomplex eigenvalues so that leading eigenvalues in thegeneralized Schur form no longer satisfy selctg = .TRUE..This could also be caused due to scaling;i = n+3: reordering failed in ?tgsen.



Specific details for the routine ggesx interface are the following:







Holds the matrix VSL of size (n, n).vsl

Holds the matrix VSR of size (n, n).vsr

Holds the vector of length (2).rconde

Holds the vector of length (2).rcondv

Restored based on the presence of the argument vsl as follows:jobvsljobvsl = 'V', if vsl is present,jobvsl = 'N', if vsl is omitted.

Restored based on the presence of the argument vsr as follows:jobvsr

1151


jobvsr = 'V', if vsr is present,jobvsr = 'N', if vsr is omitted.



sense

sense = 'B', if both rconde and rcondv are present,sense = 'E', if rconde is present and rcondv omitted,sense = 'V', if rconde is omitted and rcondv present,sense = 'N', if both rconde and rcondv are omitted.

Note that there will be an error condition if rconde or rcondv are present and select is omitted.

Application Notes





The quotients alphar(j)/beta(j) and alphai(j)/beta(j) may easily over- or underflow, andbeta(j) may even be zero. Thus, you should avoid simply computing the ratio. However, alpharand alphai will be always less than and usually comparable with norm(A) in magnitude, andbeta always less than and usually comparable with norm(B).

1152


?ggevComputes the generalized eigenvalues, and theleft and/or right generalized eigenvectors for a pairof nonsymmetric matrices.

Syntax

Fortran 77:

call sggev(jobvl, jobvr, n, a, lda, b, ldb, alphar, alphai, beta, vl, ldvl,vr, ldvr, work, lwork, info)

call dggev(jobvl, jobvr, n, a, lda, b, ldb, alphar, alphai, beta, vl, ldvl,vr, ldvr, work, lwork, info)

call cggev(jobvl, jobvr, n, a, lda, b, ldb, alpha, beta, vl, ldvl, vr, ldvr,work, lwork, rwork, info)

call zggev(jobvl, jobvr, n, a, lda, b, ldb, alpha, beta, vl, ldvl, vr, ldvr,work, lwork, rwork, info)

Fortran 95:

call ggev(a, b, alphar, alphai, beta [,vl] [,vr] [,info])

call ggev(a, b, alpha, beta [, vl] [,vr] [,info])

Description

This routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), thegeneralized eigenvalues, and optionally, the left and/or right generalized eigenvectors.

A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ,

such that A - λ*B is singular. It is usually represented as the pair (alpha, beta), as there isa reasonable interpretation for beta =0 and even for both being zero. The right generalizedeigenvector v(j) corresponding to the generalized eigenvalue l(j) of (A,B) satisfies

A*v*(j) = λ(j)*B*v*(j).

The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B)satisfies

u(j)H*A = λ(j)*u*(j)H*B

1153


where u(j)H denotes the conjugate transpose of u(j).

Input Parameters

CHARACTER*1. Must be 'N' or 'V'.jobvlIf jobvl = 'N', the left generalized eigenvectors are notcomputed;If jobvl = 'V', the left generalized eigenvectors arecomputed.

CHARACTER*1. Must be 'N' or 'V'.jobvrIf jobvr = 'N', the right generalized eigenvectors are notcomputed;If jobvr = 'V', the right generalized eigenvectors arecomputed.

INTEGER. The order of the matrices A, B, vl, and vr (n ≥0).

n

REAL for sggeva, b, workDOUBLE PRECISION for dggevCOMPLEX for cggevDOUBLE COMPLEX for zggev.Arrays:a(lda,*) is an array containing the n-by-n matrix A (first ofthe pair of matrices).The second dimension of a must be at least max(1, n).b(ldb,*) is an array containing the n-by-n matrix B (secondof the pair of matrices).The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


lda

INTEGER. The first dimension of the array b. Must be atleast max(1, n).

ldb

INTEGER. The first dimensions of the output matrices vland vr, respectively.

ldvl, ldvr

Constraints:

ldvl ≥ 1. If jobvl = 'V', ldvl ≥ max(1, n).

ldvr ≥ 1. If jobvr = 'V', ldvr ≥ max(1, n).

1154



lwork ≥ max(1, 8n+16) for real flavors;

lwork ≥ max(1, 2n) for complex flavors.For good performance, lwork must generally be larger.If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cggevrworkDOUBLE PRECISION for zggevWorkspace array, DIMENSION at least max(1, 8n).This array is used in complex flavors only.

Output Parameters

On exit, these arrays have been overwritten.a, b

REAL for sggev;alphar, alphaiDOUBLE PRECISION for dggev.Arrays, DIMENSION at least max(1, n) each. Contain valuesthat form generalized eigenvalues in real flavors.See beta.

COMPLEX for cggev;alphaDOUBLE COMPLEX for zggev.Array, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in complex flavors. See beta.

REAL for sggevbetaDOUBLE PRECISION for dggevCOMPLEX for cggevDOUBLE COMPLEX for zggev.Array, DIMENSION at least max(1, n).For real flavors:On exit, (alphar(j)+ alphai(j)*i)/beta(j), j=1,..., n, arethe generalized eigenvalues.If alphai(j) is zero, then the j-th eigenvalue is real; ifpositive, then the j-th and (j+1)-st eigenvalues are acomplex conjugate pair, with alphai(j+1) negative.

1155


For complex flavors:On exit, alpha(j)/beta(j), j=1,..., n, are the generalizedeigenvalues.See also Application Notes below.

REAL for sggevvl, vrDOUBLE PRECISION for dggevCOMPLEX for cggevDOUBLE COMPLEX for zggev.Arrays:vl(ldvl,*); the second dimension of vl must be at leastmax(1, n).If jobvl = 'V', the left generalized eigenvectors u(j) arestored one after another in the columns of vl, in the sameorder as their eigenvalues. Each eigenvector is scaled sothe largest component has abs(Re) + abs(Im) = 1.If jobvl = 'N', vl is not referenced.For real flavors:If the j-th eigenvalue is real, then u(j) = vl(:,j), thej-th column of vl.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then u(j) = vl(:,j) + i*vl(:,j+1)and u(j+1) = vl(:,j) - i*vl(:,j+1), where i =sqrt(-1).For complex flavors:u(j) = vl(:,j), the j-th column of vl.vr(ldvr,*); the second dimension of vr must be at leastmax(1, n).If jobvr = 'V', the right generalized eigenvectors v(j) arestored one after another in the columns of vr, in the sameorder as their eigenvalues. Each eigenvector is scaled sothe largest component has abs(Re) + abs(Im) = 1.If jobvr = 'N', vr is not referenced.For real flavors:If the j-th eigenvalue is real, then v(j) = vr(:,j), thej-th column of vr.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then v(j) = vr(:,j) + i*vr(:,j+1)and v(j+1) = vr(:,j) - i*vr(:,j+1).

1156


For complex flavors:v(j) = vr(:,j), the j-th column of vr.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, and

i ≤ n:the QZ iteration failed. No eigenvectors have been calculated,but alphar(j), alphai(j) (for real flavors), or alpha(j) (forcomplex flavors), and beta(j), j=info+1,..., n shouldbe correct.i > n: errors that usually indicate LAPACK problems:i = n+1: other than QZ iteration failed in ?hgeqz;i = n+2: error return from ?tgevc.



Specific details for the routine ggev interface are the following:










Restored based on the presence of the argument vr as follows:jobvr

1157


jobvr = 'V', if vr is present,jobvr = 'N', if vr is omitted.

Application Notes





The quotients alphar(j)/beta(j) and alphai(j)/beta(j) may easily over- or underflow, andbeta(j) may even be zero. Thus, you should avoid simply computing the ratio. However, alpharand alphai (for real flavors) or alpha (for complex flavors) will be always less than and usuallycomparable with norm(A) in magnitude, and beta always less than and usually comparablewith norm(B).

1158


?ggevxComputes the generalized eigenvalues, and,optionally, the left and/or right generalizedeigenvectors.

Syntax

Fortran 77:

call sggevx(balanc, jobvl, jobvr, sense, n, a, lda, b, ldb, alphar, alphai,beta, vl, ldvl, vr, ldvr, ilo, ihi, lscale, rscale, abnrm, bbnrm, rconde,rcondv, work, lwork, iwork, bwork, info)

call dggevx(balanc, jobvl, jobvr, sense, n, a, lda, b, ldb, alphar, alphai,beta, vl, ldvl, vr, ldvr, ilo, ihi, lscale, rscale, abnrm, bbnrm, rconde,rcondv, work, lwork, iwork, bwork, info)

call cggevx(balanc, jobvl, jobvr, sense, n, a, lda, b, ldb, alpha, beta, vl,ldvl, vr, ldvr, ilo, ihi, lscale, rscale, abnrm, bbnrm, rconde, rcondv, work,lwork, rwork, iwork, bwork, info)

call zggevx(balanc, jobvl, jobvr, sense, n, a, lda, b, ldb, alpha, beta, vl,ldvl, vr, ldvr, ilo, ihi, lscale, rscale, abnrm, bbnrm, rconde, rcondv, work,lwork, rwork, iwork, bwork, info)

Fortran 95:

call ggevx(a, b, alphar, alphai, beta [,vl] [,vr] [,balanc] [,ilo] [,ihi] [,lscale] [,rscale] [,abnrm] [,bbnrm] [,rconde] [,rcondv] [,info])

call ggevx(a, b, alpha, beta [, vl] [,vr] [,balanc] [,ilo] [,ihi] [,lscale][, rscale] [,abnrm] [,bbnrm] [,rconde] [,rcondv] [,info])

Description

This routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), thegeneralized eigenvalues, and optionally, the left and/or right generalized eigenvectors.

Optionally also, it computes a balancing transformation to improve the conditioning of theeigenvalues and eigenvectors (ilo, ihi, lscale, rscale, abnrm, and bbnrm), reciprocal conditionnumbers for the eigenvalues (rconde), and reciprocal condition numbers for the righteigenvectors (rcondv).

1159


A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ,

such that A - λ*B is singular. It is usually represented as the pair (alpha, beta), as there isa reasonable interpretation for beta=0 and even for both being zero. The right generalized

eigenvector v(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies

A*v*(j) = λ(j)*B*v*(j).

The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B)satisfies

u(j)H*A = l(j)*u*(j)H*B

where u(j)H denotes the conjugate transpose of u(j).

Input Parameters

CHARACTER*1. Must be 'N', 'P', 'S', or 'B'. Specifies thebalance option to be performed.

balanc

If balanc = 'N', do not diagonally scale or permute;If balanc = 'P', permute only;If balanc = 'S', scale only;If balanc = 'B', both permute and scale.Computed reciprocal condition numbers will be for thematrices after balancing and/or permuting. Permuting doesnot change condition numbers (in exact arithmetic), butbalancing does.

CHARACTER*1. Must be 'N' or 'V'.jobvlIf jobvl = 'N', the left generalized eigenvectors are notcomputed;If jobvl = 'V', the left generalized eigenvectors arecomputed.

CHARACTER*1. Must be 'N' or 'V'.jobvrIf jobvr = 'N', the right generalized eigenvectors are notcomputed;If jobvr = 'V', the right generalized eigenvectors arecomputed.


sense

If sense = 'N', none are computed;

1160


If sense = 'E', computed for eigenvalues only;If sense = 'V', computed for eigenvectors only;If sense = 'B', computed for eigenvalues andeigenvectors.

INTEGER. The order of the matrices A, B, vl, and vr (n ≥0).

n

REAL for sggevxa, b, workDOUBLE PRECISION for dggevxCOMPLEX for cggevxDOUBLE COMPLEX for zggevx.Arrays:a(lda,*) is an array containing the n-by-n matrix A (first ofthe pair of matrices).The second dimension of a must be at least max(1, n).b(ldb,*) is an array containing the n-by-n matrix B (secondof the pair of matrices).The second dimension of b must be at least max(1, n).work is a workspace array, its dimension max(1, lwork).


INTEGER. The first dimension of the array b.ldbMust be at least max(1, n).

INTEGER. The first dimensions of the output matrices vland vr, respectively.

ldvl, ldvr

Constraints:

ldvl ≥ 1. If jobvl = 'V', ldvl ≥ max(1, n).

ldvr ≥ 1. If jobvr = 'V', ldvr ≥ max(1, n).

INTEGER.lwork

The dimension of the array work. lwork ≥ max(1, 2*n);For real flavors:If balanc = 'S', or 'B', or jobvl = 'V', or jobvr =

'V', then lwork ≥ max(1, 6*n);

if sense = 'E', or 'B', then lwork ≥ max(1, 10*n);

if sense = 'V', or 'B', lwork ≥ (2n2+ 8*n+16).For complex flavors:

1161


if sense = 'E', lwork ≥ max(1, 4*n);

if sense = 'V', or 'B', lwork ≥max(1, 2*n2+ 2*n).If lwork = -1, then a workspace query is assumed; theroutine only calculates the optimal size of the work array,returns this value as the first entry of the work array, andno error message related to lwork is issued by xerbla.

REAL for cggevxrworkDOUBLE PRECISION for zggevxWorkspace array, DIMENSION at least max(1, 6*n) ifbalanc = 'S', or 'B', and at least max(1, 2*n)otherwise.This array is used in complex flavors only.

INTEGER.iworkWorkspace array, DIMENSION at least (n+6) for real flavorsand at least (n+2) for complex flavors.Not referenced if sense = 'E'.

LOGICAL. Workspace array, DIMENSION at least max(1, n).bworkNot referenced if sense = 'N'.

Output Parameters

On exit, these arrays have been overwritten.a, bIf jobvl = 'V' or jobvr = 'V' or both, then a containsthe first part of the real Schur form of the “balanced”versions of the input A and B, and b contains its second part.

REAL for sggevx;alphar, alphaiDOUBLE PRECISION for dggevx.Arrays, DIMENSION at least max(1, n) each. Contain valuesthat form generalized eigenvalues in real flavors.See beta.

COMPLEX for cggevx;alphaDOUBLE COMPLEX for zggevx.Array, DIMENSION at least max(1, n). Contain values thatform generalized eigenvalues in complex flavors. See beta.

REAL for sggevxbetaDOUBLE PRECISION for dggevxCOMPLEX for cggevx

1162


DOUBLE COMPLEX for zggevx.Array, DIMENSION at least max(1, n).For real flavors:On exit, (alphar(j) + alphai(j)*i)/beta(j), j=1,..., n, willbe the generalized eigenvalues.If alphai(j) is zero, then the j-th eigenvalue is real; ifpositive, then the j-th and (j+1)-st eigenvalues are acomplex conjugate pair, with alphai(j+1) negative.For complex flavors:On exit, alpha(j)/beta(j), j=1,..., n, will be the generalizedeigenvalues.See also Application Notes below.

REAL for sggevxvl, vrDOUBLE PRECISION for dggevxCOMPLEX for cggevxDOUBLE COMPLEX for zggevx.Arrays:vl(ldvl,*); the second dimension of vl must be at leastmax(1, n).If jobvl = 'V', the left generalized eigenvectors u(j) arestored one after another in the columns of vl, in the sameorder as their eigenvalues. Each eigenvector will be scaledso the largest component have abs(Re) + abs(Im) = 1.If jobvl = 'N', vl is not referenced.For real flavors:If the j-th eigenvalue is real, then u(j) = vl(:,j), thej-th column of vl.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then u(j) = vl(:,j) + i*vl(:,j+1)and u(j+1) = vl(:,j) - i*vl(:,j+1), where i =sqrt(-1).For complex flavors:u(j) = vl(:,j), the j-th column of vl.vr(ldvr,*); the second dimension of vr must be at leastmax(1, n).

1163


If jobvr = 'V', the right generalized eigenvectors v(j) arestored one after another in the columns of vr, in the sameorder as their eigenvalues. Each eigenvector will be scaledso the largest component have abs(Re) + abs(Im) = 1.If jobvr = 'N', vr is not referenced.For real flavors:If the j-th eigenvalue is real, then v(j) = vr(:,j), thej-th column of vr.If the j-th and (j+1)-st eigenvalues form a complexconjugate pair, then v(j) = vr(:,j) + i*vr(:,j+1)and v(j+1) = vr(:,j) - i*vr(:,j+1).For complex flavors:v(j) = vr(:,j), the j-th column of vr.

INTEGER. ilo and ihi are integer values such that on exitA(i,j) = 0 and B(i,j) = 0 if i > j and j = 1,...,ilo-1 or i = ihi+1,..., n.

ilo, ihi

If balanc = 'N' or 'S', ilo = 1 and ihi = n.

REAL for single-precision flavorslscale, rscaleDOUBLE PRECISION for double-precision flavors.Arrays, DIMENSION at least max(1, n) each.lscale contains details of the permutations and scalingfactors applied to the left side of A and B.If PL(j) is the index of the row interchanged with row j, andDL(j) is the scaling factor applied to row j, thenlscale(j) = PL(j), for j = 1,..., ilo-1= DL(j), for j = ilo,...,ihi= PL(j) for j = ihi+1,..., n.The order in which the interchanges are made is n to ihi+1,then 1 to ilo-1.rscale contains details of the permutations and scalingfactors applied to the right side of A and B.If PR(j) is the index of the column interchanged with columnj, and DR(j) is the scaling factor applied to column j, thenrscale(j) = PR(j), for j = 1,..., ilo-1= DR(j), for j = ilo,...,ihi= PR(j) for j = ihi+1,..., n.The order in which the interchanges are made is n to ihi+1,then 1 to ilo-1.

1164


REAL for single-precision flavorsabnrm, bbnrmDOUBLE PRECISION for double-precision flavors.The one-norms of the balanced matrices A and B,respectively.


rconde, rcondv

Arrays, DIMENSION at least max(1, n) each.If sense = 'E', or 'B', rconde contains the reciprocalcondition numbers of the eigenvalues, stored in consecutiveelements of the array. For a complex conjugate pair ofeigenvalues two consecutive elements of rconde are set tothe same value. Thus rconde(j), rcondv(j), and the j-thcolumns of vl and vr all correspond to the same eigenpair(but not in general the j-th eigenpair, unless all eigenpairsare selected).If sense = 'N', or 'V', rconde is not referenced.If sense = 'V', or 'B', rcondv contains the estimatedreciprocal condition numbers of the eigenvectors, stored inconsecutive elements of the array. For a complexeigenvector two consecutive elements of rcondv are set tothe same value.If the eigenvalues cannot be reordered to computercondv(j), rcondv(j) is set to 0; this can only occur whenthe true value would be very small anyway.If sense = 'N', or 'E', rcondv is not referenced.


work(1)

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i, and

i ≤ n:the QZ iteration failed. No eigenvectors have been calculated,but alphar(j), alphai(j) (for real flavors), or alpha(j) (forcomplex flavors), and beta(j), j=info+1,..., n shouldbe correct.i > n: errors that usually indicate LAPACK problems:

1165


i = n+1: other than QZ iteration failed in ?hgeqz;i = n+2: error return from ?tgevc.



Specific details for the routine ggevx interface are the following:











Holds the vector of length (n).rconde

Holds the vector of length (n).rcondv

Must be 'N', 'B', or 'P'. The default value is 'N'.balanc




sense

sense = 'B', if both rconde and rcondv are present,sense = 'E', if rconde is present and rcondv omitted,sense = 'V', if rconde is omitted and rcondv present,sense = 'N', if both rconde and rcondv are omitted.

1166


Application Notes





The quotients alphar(j)/beta(j) and alphai(j)/beta(j) may easily over- or underflow, andbeta(j) may even be zero. Thus, you should avoid simply computing the ratio. However, alpharand alphai (for real flavors) or alpha (for complex flavors) will be always less than and usuallycomparable with norm(A) in magnitude, and beta always less than and usually comparablewith norm(B).

1167


5LAPACK Auxiliary and UtilityRoutines

This chapter describes the Intel® Math Kernel Library implementation of LAPACK auxiliary and utilityroutines. The library includes auxiliary routines for both real and complex data.

Auxiliary RoutinesRoutine naming conventions, mathematical notation, and matrix storage schemes used for LAPACKauxiliary routines are the same as for the driver and computational routines described in previouschapters.

The table below summarizes information about the available LAPACK auxiliary routines.

Table 5-1 LAPACK Auxiliary Routines

DescriptionDataTypes

Routine Name

Conjugates a complex vector.c, z?lacgv

Multiplies a complex matrix by a square real matrix.c, z?lacrm

Performs a linear transformation of a pair of complex vectors.c, z?lacrt

Computes the eigenvalues and eigenvectors of a 2-by-2 complexsymmetric matrix.

c, z?laesy

Applies a plane rotation with real cosine and complex sine to apair of complex vectors.

c, z?rot

Computes a matrix-vector product for complex vectors using acomplex symmetric packed matrix

c, z?spmv

Performs the symmetrical rank-1 update of a complex symmetricpacked matrix.

c, z?spr

Computes a matrix-vector product for a complex symmetricmatrix.

c, z?symv

Performs the symmetric rank-1 update of a complex symmetricmatrix.

c, z?syr

1169


Routine Name

Finds the index of the vector element whose real part hasmaximum absolute value.

c, zi?max1

Forms the 1-norm of the complex vector using the trueabsolute value.

sc, dz?sum1

Computes the LU factorization of a general band matrix usingthe unblocked version of the algorithm.

s, d, c, z?gbtf2

Reduces a general matrix to bidiagonal form using anunblocked algorithm.

s, d, c, z?gebd2

Reduces a general square matrix to upper Hessenberg formusing an unblocked algorithm.

s, d, c, z?gehd2

Computes the LQ factorization of a general rectangular matrixusing an unblocked algorithm.

s, d, c, z?gelq2

Computes the QL factorization of a general rectangular matrixusing an unblocked algorithm.

s, d, c, z?geql2

Computes the QR factorization of a general rectangular matrixusing an unblocked algorithm.

s, d, c, z?geqr2

Computes the RQ factorization of a general rectangular matrixusing an unblocked algorithm.

s, d, c, z?gerq2

Solves a system of linear equations using the LU factorizationwith complete pivoting computed by ?getc2.

s, d, c, z?gesc2

Computes the LU factorization with complete pivoting of thegeneral n-by-n matrix.

s, d, c, z?getc2

Computes the LU factorization of a general m-by-n matrixusing partial pivoting with row interchanges (unblockedalgorithm).

s, d, c, z?getf2

Solves a system of linear equations with a tridiagonal matrixusing the LU factorization computed by ?gttrf.

s, d, c, z?gtts2

Tests input for NaN.s, d,?isnan

1170



Routine Name

Tests input for NaN by comparing itwo arguments forinequality.

s, d,?laisnan

Reduces the first nb rows and columns of a general matrix toa bidiagonal form.

s, d, c, z?labrd

Estimates the 1-norm of a square matrix, using reversecommunication for evaluating matrix-vector products.

s, d, c, z?lacn2

Estimates the 1-norm of a square matrix, using reversecommunication for evaluating matrix-vector products.

s, d, c, z?lacon

Copies all or part of one two-dimensional array to another.s, d, c, z?lacpy

Performs complex division in real arithmetic, avoidingunnecessary overflow.

s, d, c, z?ladiv

Computes the eigenvalues of a 2-by-2 symmetric matrix.s, d?lae2

Computes the number of eigenvalues of a real symmetrictridiagonal matrix which are less than or equal to a givenvalue, and performs other tasks required by the routine?stebz.

s, d?laebz

Used by ?stedc. Computes all eigenvalues and correspondingeigenvectors of an unreduced symmetric tridiagonal matrixusing the divide and conquer method.

s, d, c, z?laed0

Used by sstedc/dstedc. Computes the updated eigensystemof a diagonal matrix after modification by a rank-onesymmetric matrix. Used when the original matrix is tridiagonal.

s, d?laed1

Used by sstedc/dstedc. Merges eigenvalues and deflatessecular equation. Used when the original matrix is tridiagonal.

s, d?laed2

Used by sstedc/dstedc. Finds the roots of the secularequation and updates the eigenvectors. Used when the originalmatrix is tridiagonal.

s, d?laed3

1171

LAPACK Auxiliary and Utility Routines 5


Routine Name

Used by sstedc/dstedc. Finds a single root of the secularequation.

s, d?laed4

Used by sstedc/dstedc. Solves the 2-by-2 secular equation.s, d?laed5

Used by sstedc/dstedc. Computes one Newton step insolution of the secular equation.

s, d?laed6

Used by ?stedc. Computes the updated eigensystem of adiagonal matrix after modification by a rank-one symmetricmatrix. Used when the original matrix is dense.

s, d, c, z?laed7

Used by ?stedc. Merges eigenvalues and deflates secularequation. Used when the original matrix is dense.

s, d, c, z?laed8

Used by sstedc/dstedc. Finds the roots of the secularequation and updates the eigenvectors. Used when the originalmatrix is dense.

s, d?laed9

Used by ?stedc. Computes the Z vector determining therank-one modification of the diagonal matrix. Used when theoriginal matrix is dense.

s, d?laeda

Computes a specified right or left eigenvector of an upperHessenberg matrix by inverse iteration.

s, d, c, z?laein

Computes the eigenvalues and eigenvectors of a 2-by-2symmetric/Hermitian matrix.

s, d, c, z?laev2

Swaps adjacent diagonal blocks of a real upperquasi-triangular matrix in Schur canonical form, by anorthogonal similarity transformation.

s, d?laexc

Computes the eigenvalues of a 2-by-2 generalized eigenvalueproblem, with scaling as necessary to avoid over-/underflow.

s, d?lag2

Computes 2-by-2 orthogonal matrices U, V, and Q, and appliesthem to matrices A and B such that the rows of thetransformed A and B are parallel.

s, d?lags2

1172



Routine Name

Computes an LU factorization of a matrix T-λI, where T is a

general tridiagonal matrix, and λ a scalar, using partialpivoting with row interchanges.

s, d?lagtf

Performs a matrix-matrix product of the form C = αab+βC,where A is a tridiagonal matrix, B and C are rectangular

matrices, and α and β are scalars, which may be 0, 1, or -1.

s, d, c, z?lagtm

Solves the system of equations (T-λI)x = y or (T-λI)Tx = y

,where T is a general tridiagonal matrix and λ a scalar, usingthe LU factorization computed by ?lagtf.

s, d?lagts

Computes the Generalized Schur factorization of a real 2-by-2matrix pencil (A,B) where B is upper triangular.

s, d?lagv2

Computes the eigenvalues and Schur factorization of an upperHessenberg matrix, using the double-shift/single-shift QRalgorithm.

s, d, c, z?lahqr

Reduces the first nb columns of a general rectangular matrixA so that elements below the k-th subdiagonal are zero, andreturns auxiliary matrices which are needed to apply thetransformation to the unreduced part of A.

s, d, c, z?lahrd

Reduces the specified number of first columns of a generalrectangular matrix A so that elements below thespecifiedsubdiagonal are zero, and returns auxiliary matrices whichare needed to apply the transformation to the unreduced partof A.

s, d, c, z?lahr2

Applies one step of incremental condition estimation.s, d, c, z?laic1

Solves a 1-by-1 or 2-by-2 linear system of equations of thespecified form.

s, d?laln2

Applies back multiplying factors in solving the least squaresproblem using divide and conquer SVD approach. Used by?gelsd.

s, d, c, z?lals0

1173



Routine Name

Computes the SVD of the coefficient matrix in compact form.Used by ?gelsd.

s, d, c, z?lalsa

Uses the singular value decomposition of A to solve the leastsquares problem.

s, d, c, z?lalsd

Creates a permutation list to merge the entries of twoindependently sorted sets into a single set sorted in ascendingorder.

s, d?lamrg

Computes the Sturm count.s, d?laneg

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element ofgeneral band matrix.

s, d, c, z?langb

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element ofa general rectangular matrix.

s, d, c, z?lange

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element ofa general tridiagonal matrix.

s, d, c, z?langt

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element ofan upper Hessenberg matrix.

s, d, c, z?lanhs

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa symmetric band matrix.

s, d, c, z?lansb

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa Hermitian band matrix.

c, z?lanhb

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa symmetric matrix supplied in packed form.

s, d, c, z?lansp

1174



Routine Name

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa complex Hermitian matrix supplied in packed form.

c, z?lanhp

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa real symmetric or complex Hermitian tridiagonal matrix.

s, d/c, z?lanst/?lanht

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa real/complex symmetric matrix.

s, d, c, z?lansy

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa complex Hermitian matrix.

c, z?lanhe

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa triangular band matrix.

s, d, c, z?lantb

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa triangular matrix supplied in packed form.

s, d, c, z?lantp

Returns the value of the 1-norm, or the Frobenius norm, orthe infinity norm, or the element of largest absolute value ofa trapezoidal or triangular matrix.

s, d, c, z?lantr

Computes the Schur factorization of a real 2-by-2nonsymmetric matrix in standard form.

s, d?lanv2

Measures the linear dependence of two vectors.s, d, c, z?lapll

Performs a forward or backward permutation of the columnsof a matrix.

s, d, c, z?lapmt

Returns sqrt(x2+y2).s, d?lapy2

Returns sqrt(x2+y2+z2).s, d?lapy3

1175



Routine Name

Scales a general band matrix, using row and column scalingfactors computed by ?gbequ.

s, d, c, z?laqgb

Scales a general rectangular matrix, using row and columnscaling factors computed by ?geequ.

s, d, c, z?laqge

Scales a Hermetian band matrix, using scaling factorscomputed by ?pbequ.

c, z?laqhb

Computes a QR factorization with column pivoting of thematrix block.

s, d, c, z?laqp2

Computes a step of QR factorization with column pivoting ofa real m-by-n matrix A by using BLAS level 3.

s, d, c, z?laqps

Computes the eigenvalues of a Hessenberg matrix, andoptionally the marixes from the Schur decomposition.

s, d, c, z?laqr0

Sets a scalar multiple of the first column of the product of2-by-2 or 3-by-3 matrix H and specified shifts.

s, d, c, z?laqr1

Performs the orthogonal/unitary similarity transformation ofa Hessenberg matrix to detect and deflate fully convergedeigenvalues from a trailing principal submatrix (aggresiveearly deflation).

s, d, c, z?laqr2

Performs the orthogonal/unitary similarity transformation ofa Hessenberg matrix to detect and deflate fully convergedeigenvalues from a trailing principal submatrix (aggresiveearly deflation).

s, d, c, z?laqr3

Computes the eigenvalues of a Hessenberg matrix, andoptionally the marices from the Schur decomposition.

s, d, c, z?laqr4

Performs a single small-bulge multi-shift QR sweep.s, d, c, z?laqr5

Scales a symmetric/Hermitian band matrix, using scalingfactors computed by ?pbequ.

s, d, c, z?laqsb

1176



Routine Name

Scales a symmetric/Hermitian matrix in packed storage, usingscaling factors computed by ?ppequ.

s, d, c, z?laqsp

Scales a symmetric/Hermitian matrix, using scaling factorscomputed by ?poequ.

s, d, c, z?laqsy

Solves a real quasi-triangular system of equations, or acomplex quasi-triangular system of special form, in realarithmetic.

s, d?laqtr

Computes the (scaled) r-th column of the inverse of thesubmatrix in rows b1 through bn of the tridiagonal matrix ldLT

- σI.

s, d, c, z?lar1v

Applies a vector of plane rotations with real cosines andreal/complex sines from both sides to a sequence of 2-by-2symmetric/Hermitian matrices.

s, d, c, z?lar2v

Applies an elementary reflector to a general rectangularmatrix.

s, d, c, z?larf

Applies a block reflector or its transpose/conjugate-transposeto a general rectangular matrix.

s, d, c, z?larfb

Generates an elementary reflector (Householder matrix).s, d, c, z?larfg

Forms the triangular factor T of a block reflector H = I - vtvHs, d, c, z?larft

Applies an elementary reflector to a general rectangularmatrix, with loop unrolling when the reflector has order ≤ 10.

s, d, c, z?larfx

Generates a vector of plane rotations with real cosines andreal/complex sines.

s, d, c, z?largv

Returns a vector of random numbers from a uniform or normaldistribution.

s, d, c, z?larnv

Computes the splitting points with the specified threshold.s, d?larra

1177



Routine Name

Provides limited bisection to locate eigenvalues for moreaccuracy.

s, d?larrb

Computes the number of eigenvalues of the symmetrictridiagonal matrix.

s, d?larrc

Computes the eigenvalues of a symmetric tridiagonal matrixto suitable accuracy.

s, d?larrd

Given the tridiagonal matrix T, sets small off-diagonalelements to zero and for each unreduced block Ti, finds baserepresentations and eigenvalues.

s, d?larre

Finds a new relatively robust representation such that at leastone of the eigenvalues is relatively isolated.

s, d?larrf

Performs refinement of the initial estimates of the eigenvaluesof the matrix T.

s, d?larrj

Computes one eigenvalue of a symmetric tridiagonal matrixT to suitable accuracy.

s, d?larrk

Performs tests to decide whether the symmetric tridiagonalmatrix T warrants expensive computations which guaranteehigh relative accuracy in the eigenvalues.

s, d?larrr

Computes the eigenvectors of the tridiagonal matrix T = L DLT given L, D and the eigenvalues of L D LT.

s, d, c, z?larrv

Generates a plane rotation with real cosine and real/complexsine.

s, d, c, z?lartg

Applies a vector of plane rotations with real cosines andreal/complex sines to the elements of a pair of vectors.

s, d, c, z?lartv

Returns a vector of n random real numbers from a uniformdistribution.

s, d?laruv

Applies an elementary reflector (as returned by ?tzrzf) toa general matrix.

s, d, c, z?larz

1178



Routine Name

Applies a block reflector or its transpose/conjugate-transposeto a general matrix.

s, d, c, z?larzb

Forms the triangular factor T of a block reflector H = I - vtvH.s, d, c, z?larzt

Computes singular values of a 2-by-2 triangular matrix.s, d?las2

Multiplies a general rectangular matrix by a real scalar definedas cto/cfrom.

s, d, c, z?lascl

Computes the singular values of a real upper bidiagonaln-by-m matrix B with diagonal d and off-diagonal e. Used by?bdsdc.

s, d?lasd0

Computes the SVD of an upper bidiagonal matrix B of thespecified size. Used by ?bdsdc.

s, d?lasd1

Merges the two sets of singular values together into a singlesorted set. Used by ?bdsdc.

s, d?lasd2

Finds all square roots of the roots of the secular equation, asdefined by the values in D and Z, and then updates thesingular vectors by matrix multiplication. Used by ?bdsdc.

s, d?lasd3

Computes the square root of the i-th updated eigenvalue ofa positive symmetric rank-one modification to a positivediagonal matrix. Used by ?bdsdc.

s, d?lasd4

Computes the square root of the i-th eigenvalue of a positivesymmetric rank-one modification of a 2-by-2 diagonalmatrix.Used by ?bdsdc.

s, d?lasd5

Computes the SVD of an updated upper bidiagonal matrixobtained by merging two smaller ones by appending a row.Used by ?bdsdc.

s, d?lasd6

Merges the two sets of singular values together into a singlesorted set. Then it tries to deflate the size of the problem.Used by ?bdsdc.

s, d?lasd7

1179



Routine Name

Finds the square roots of the roots of the secular equation,and stores, for each element in D, the distance to its twonearest poles. Used by ?bdsdc.

s, d?lasd8

Finds the square roots of the roots of the secular equation,and stores, for each element in D, the distance to its twonearest poles. Used by ?bdsdc.

s, d?lasd9

Computes the singular value decomposition (SVD) of a realupper bidiagonal matrix with diagonal d and off-diagonal e.Used by ?bdsdc.

s, d?lasda

Computes the SVD of a real bidiagonal matrix with diagonald and off-diagonal e. Used by ?bdsdc.

s, d?lasdq

Creates a tree of subproblems for bidiagonal divide andconquer. Used by ?bdsdc.

s, d?lasdt

Initializes the off-diagonal elements and the diagonal elementsof a matrix to given values.

s, d, c, z?laset

Computes the singular values of a real square bidiagonalmatrix. Used by ?bdsqr.

s, d?lasq1

Computes all the eigenvalues of the symmetric positivedefinite tridiagonal matrix associated with the qd Array Z tohigh relative accuracy. Used by ?bdsqr and ?stegr.

s, d?lasq2

Checks for deflation, computes a shift and calls dqds. Usedby ?bdsqr.

s, d?lasq3

Computes an approximation to the smallest eigenvalue usingvalues of d from the previous transform. Used by ?bdsqr.

s, d?lasq4

Computes one dqds transform in ping-pong form. Used by?bdsqr and ?stegr.

s, d?lasq5

Computes one dqd transform in ping-pong form. Used by?bdsqr and ?stegr.

s, d?lasq6

1180



Routine Name

Applies a sequence of plane rotations to a general rectangularmatrix.

s, d, c, z?lasr

Sorts numbers in increasing or decreasing order.s, d?lasrt

Updates a sum of squares represented in scaled form.s, d, c, z?lassq

Computes the singular value decomposition of a 2-by-2triangular matrix.

s, d?lasv2

Performs a series of row interchanges on a general rectangularmatrix.

s, d, c, z?laswp

Solves the Sylvester matrix equation where the matrices areof order 1 or 2.

s, d?lasy2

Computes a partial factorization of a real/complex symmetricmatrix, using the diagonal pivoting method.

s, d, c, z?lasyf

Computes a partial factorization of a complex Hermitianindefinite matrix, using the diagonal pivoting method.

c, z?lahef

Solves a triangular banded system of equations.s, d, c, z?latbs

Uses the LU factorization of the n-by-n matrix computed by?getc2 and computes a contribution to the reciprocalDif-estimate.

s, d, c, z?latdf

Solves a triangular system of equations with the matrix heldin packed storage.

s, d, c, z?latps

Reduces the first nb rows and columns of asymmetric/Hermitian matrix A to real tridiagonal form by anorthogonal/unitary similarity transformation.

s, d, c, z?latrd

Solves a triangular system of equations with the scale factorset to prevent overflow.

s, d, c, z?latrs

Factors an upper trapezoidal matrix by means oforthogonal/unitary transformations.

s, d, c, z?latrz

1181



Routine Name

Computes the product UUH or LHL, where U and L are upperor lower triangular matrices (unblocked algorithm).

s, d, c, z?lauu2

Computes the product UUH or LHL, where U and L are upperor lower triangular matrices (blocked algorithm).

s, d, c, z?lauum

Checks for deflation, computes a shift and calls dqdss, d,?lazq3

Computes an approximation to the smallest eigenvalue usingvalues of d from the previous transform.

s, d,?lazq4

Generates all or part of the orthogonal/unitary matrix Q froma QL factorization determined by ?geqlf (unblockedalgorithm).

s, d/c, z?org2l/?ung2l

Generates all or part of the orthogonal/unitary matrix Q froma QR factorization determined by ?geqrf (unblockedalgorithm).

s, d/c, z?org2r/?ung2r

Generates all or part of the orthogonal/unitary matrix Q froman LQ factorization determined by ?gelqf (unblockedalgorithm).

s, d/c, z?orgl2/?ungl2

Generates all or part of the orthogonal/unitary matrix Q froman RQ factorization determined by ?gerqf (unblockedalgorithm).

s, d/c, z?orgr2/?ungr2

Multiplies a general matrix by the orthogonal/unitary matrixfrom a QL factorization determined by ?geqlf (unblockedalgorithm).

s, d/c, z?orm2l/?unm2l

Multiplies a general matrix by the orthogonal/unitary matrixfrom a QR factorization determined by ?geqrf (unblockedalgorithm).

s, d/c, z?orm2r/?unm2r

Multiplies a general matrix by the orthogonal/unitary matrixfrom a LQ factorization determined by ?gelqf (unblockedalgorithm).

s, d/c, z?orml2/?unml2

1182



Routine Name

Multiplies a general matrix by the orthogonal/unitary matrixfrom a RQ factorization determined by ?gerqf (unblockedalgorithm).

s, d/c, z?ormr2/?unmr2

Multiplies a general matrix by the orthogonal/unitary matrixfrom a RZ factorization determined by ?tzrzf (unblockedalgorithm).

s, d/c, z?ormr3/?unmr3

Computes the Cholesky factorization of a symmetric/Hermitian positive definite band matrix (unblocked algorithm).

s, d, c, z?pbtf2

Computes the Cholesky factorization of a symmetric/Hermitianpositive definite matrix (unblocked algorithm).

s, d, c, z?potf2

Solves a tridiagonal system of the form AX=B using the L DLH factorization computed by ?pttrf.

s, d, c, z?ptts2

Multiplies a vector by the reciprocal of a real scalar.s, d, cs,zd

?rscl

Reduces a symmetric/Hermitian definite generalizedeigenproblem to standard form, using the factorization resultsobtained from ?potrf (unblocked algorithm).

s, d/c, z?sygs2/?hegs2

Reduces a symmetric/Hermitian matrix to real symmetrictridiagonal form by an orthogonal/unitary similaritytransformation (unblocked algorithm).

s, d/c, z?sytd2/?hetd2

Computes the factorization of a real/complex symmetricindefinite matrix, using the diagonal pivoting method(unblocked algorithm).

s, d, c, z?sytf2

Computes the factorization of a complex Hermitian matrix,using the diagonal pivoting method (unblocked algorithm).

c, z?hetf2

Swaps adjacent diagonal blocks in an upper (quasi) triangularmatrix pair by an orthogonal/unitary equivalencetransformation.

s, d, c, z?tgex2

1183



Routine Name

Solves the generalized Sylvester equation (unblockedalgorithm).

s, d, c, z?tgsy2

Computes the inverse of a triangular matrix (unblockedalgorithm).

s, d, c, z?trti2

Converts a complex single precision matrix to a complexdouble precision matrix.

c → zclag2z

Converts a double precision matrix to a single precision matrix.d → sdlag2s

Converts a single precision matrix to a double precision matrix.s → dslag2d

Converts a complex double precision matrix to a complexsingle precision matrix.

z → czlag2c

?lacgvConjugates a complex vector.

Syntax

call clacgv( n, x, incx )

call zlacgv( n, x, incx )

Description

This routine conjugates a complex vector x of length n and increment incx (see “VectorArguments in BLAS” in Appendix B).

Input Parameters

INTEGER. The length of the vector x (n ≥ 0).n

COMPLEX for clacgvxCOMPLEX*16 for zlacgv.Array, dimension (1+(n-1)* |incx|).Contains the vector of length n to be conjugated.

1184


INTEGER. The spacing between successive elements of x.incx

Output Parameters

On exit, overwritten with conjg(x).x

?lacrmMultiplies a complex matrix by a square realmatrix.

Syntax

call clacrm( m, n, a, lda, b, ldb, c, ldc, rwork )

call zlacrm( m, n, a, lda, b, ldb, c, ldc, rwork )

Description

This routine performs a simple matrix-matrix multiplication of the form

C = A*B,

where A is m-by-n and complex, B is n-by-n and real, C is m-by-n and complex.

Input Parameters

INTEGER. The number of rows of the matrix A and of the

matrix C (m ≥ 0).

m

INTEGER. The number of columns and rows of the matrix Band the number of columns of the matrix C

n

(n ≥ 0).

COMPLEX for clacrmaCOMPLEX*16 for zlacrmArray, DIMENSION (lda, n). Contains the m-by-n matrixA.

INTEGER. The leading dimension of the array a, lda ≥max(1, m).

lda

REAL for clacrmbDOUBLE PRECISION for zlacrm

1185


Array, DIMENSION (ldb, n). Contains the n-by-n matrixB.

INTEGER. The leading dimension of the array b, ldb ≥ max(1,n).

ldb

INTEGER. The leading dimension of the output array c, ldc

≥ max(1, n).

ldc

REAL for clacrmrworkDOUBLE PRECISION for zlacrmWorkspace array, DIMENSION (2*m*n).

Output Parameters

COMPLEX for clacrmcCOMPLEX*16 for zlacrmArray, DIMENSION (ldc, n). Contains the m-by-n matrix C.

?lacrtPerforms a linear transformation of a pair ofcomplex vectors.

Syntax

call clacrt( n, cx, incx, cy, incy, c, s )

call zlacrt( n, cx, incx, cy, incy, c, s )

Description

This routine performs the following transformation

where c, s are complex scalars and x, y are complex vectors.

1186


Input Parameters

INTEGER. The number of elements in the vectors cx and cy

(n ≥ 0).

n

COMPLEX for clacrtcx, cyCOMPLEX*16 for zlacrtArrays, dimension (n).Contain input vectors x and y, respectively.

INTEGER. The increment between successive elements ofcx.

incx

INTEGER. The increment between successive elements ofcy.

incy

COMPLEX for clacrtc, sCOMPLEX*16 for zlacrtComplex scalars that define the transform matrix

Output Parameters

On exit, overwritten with c*x + s*y .cx

On exit, overwritten with -s*x + c*y .cy

?laesyComputes the eigenvalues and eigenvectors of a2-by-2 complex symmetric matrix, and checks thatthe norm of the matrix of eigenvectors is largerthan a threshold value.

Syntax

call claesy( a, b, c, rt1, rt2, evscal, cs1, sn1 )

call zlaesy( a, b, c, rt1, rt2, evscal, cs1, sn1 )

1187


Description

This routine performs the eigendecomposition of a 2-by-2 symmetric matrix

provided the norm of the matrix of eigenvectors is larger than some threshold value.

rt1 is the eigenvalue of larger absolute value, and rt2 of smaller absolute value. If theeigenvectors are computed, then on return (cs1, sn1) is the unit eigenvector for rt1, hence

Input Parameters

COMPLEX for claesya, b, cCOMPLEX*16 for zlaesyElements of the input matrix.

Output Parameters

COMPLEX for claesyrt1, rt2COMPLEX*16 for zlaesyEigenvalues of larger and smaller modulus, respectively.

COMPLEX for claesyevscalCOMPLEX*16 for zlaesyThe complex value by which the eigenvector matrix wasscaled to make it orthonormal. If evscal is zero, theeigenvectors were not computed. This means one of twothings: the 2-by-2 matrix could not be diagonalized, or thenorm of the matrix of eigenvectors before scaling was largerthan the threshold value thresh (set to 0.1E0).

1188


COMPLEX for claesycs1, sn1COMPLEX*16 for zlaesyIf evscal is not zero, then (cs1, sn1) is the unit righteigenvector for rt1.

?rotApplies a plane rotation with real cosine andcomplex sine to a pair of complex vectors.

Syntax

call crot( n, cx, incx, cy, incy, c, s )

call zrot( n, cx, incx, cy, incy, c, s )

Description

This routine applies a plane rotation, where the cosine (c) is real and the sine (s) is complex,and the vectors cx and cy are complex. This routine has its real equivalents in BLAS (see ?rotin Chapter 2).

Input Parameters

INTEGER. The number of elements in the vectors cx andcy.

n

COMPLEX for crotcx, cyCOMPLEX*16 for zrotArrays of dimension (n), contain input vectors x and y,respectively.

INTEGER. The increment between successive elements ofcx.

incx

INTEGER. The increment between successive elements ofcy.

incy

REAL for crotcDOUBLE PRECISION for zrot

COMPLEX for crotsCOMPLEX*16 for zrotValues that define a rotation

1189


where c*c + s*conjg(s) = 1.0.

Output Parameters

On exit, overwritten with c*x + s*y.cx

On exit, overwritten with -conjg(s)*x + c*y.cy

?spmvComputes a matrix-vector product for complexvectors using a complex symmetric packed matrix.

Syntax

call cspmv( uplo, n, alpha, ap, x, incx, beta, y, incy )

call zspmv( uplo, n, alpha, ap, x, incx, beta, y, incy )

Description

These routines perform a matrix-vector operation defined as

y := alpha*a*x + beta*y,

where:

alpha and beta are complex scalars,

x and y are n-element complex vectors

a is an n-by-n complex symmetric matrix, supplied in packed form.

These routines have their real equivalents in BLAS (see ?spmv in Chapter 2 ).

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the matrix a is supplied in the packedarray ap, as follows:

uplo

1190


If uplo = 'U' or 'u', the upper triangular part of thematrix a is supplied in the array ap.If uplo = 'L' or 'l', the lower triangular part of thematrix a is supplied in the array ap .

INTEGER.nSpecifies the order of the matrix a.The value of n must be at least zero.

COMPLEX for cspmvalpha, betaCOMPLEX*16 for zspmvSpecify complex scalars alpha and beta. When beta issupplied as zero, then y need not be set on input.

COMPLEX for cspmvapCOMPLEX*16 for zspmvArray, DIMENSION at least ((n*(n + 1))/2). Before entry,with uplo = 'U' or 'u', the array ap must contain theupper triangular part of the symmetric matrix packedsequentially, column-by-column, so that ap(1) containsA(1, 1), ap(2) and ap(3) contain A(1, 2) and A(2, 2)respectively, and so on. Before entry, with uplo = 'L' or'l', the array ap must contain the lower triangular part ofthe symmetric matrix packed sequentially,column-by-column, so that ap(1) contains a(1, 1), ap(2)and ap(3) contain a(2, 1) and a(3, 1) respectively, andso on.

COMPLEX for cspmvxCOMPLEX*16 for zspmvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.

INTEGER. Specifies the increment for the elements of x. Thevalue of incx must not be zero.

incx

COMPLEX for cspmvyCOMPLEX*16 for zspmvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.

1191



incy

Output Parameters


?sprPerforms the symmetrical rank-1 update of acomplex symmetric packed matrix.

Syntax

call cspr( uplo, n, alpha, x, incx, ap )

call zspr( uplo, n, alpha, x, incx, ap )

Description

The ?spr routines perform a matrix-vector operation defined as

a:= alpha*x*conjg(x') + a,

where:

alpha is a complex scalar

x is an n-element complex vector

a is an n-by-n complex symmetric matrix, supplied in packed form.

These routines have their real equivalents in BLAS (see ?spr in Chapter 2).

Input Parameters

CHARACTER*1. Specifies whether the upper or lowertriangular part of the matrix a is supplied in the packedarray ap, as follows:

uplo

If uplo = 'U' or 'u', the upper triangular part of thematrix a is supplied in the array ap.If uplo = 'L' or 'l', the lower triangular part of thematrix a is supplied in the array ap .

INTEGER.n

1192


Specifies the order of the matrix a.The value of n must be at least zero.

COMPLEX for cspralphaCOMPLEX*16 for zsprSpecifies the scalar alpha.

COMPLEX for csprxCOMPLEX*16 for zsprArray, DIMENSION at least (1 + (n - 1)*abs(incx)). Beforeentry, the incremented array x must contain the n-elementvector x.


incx

COMPLEX for csprapCOMPLEX*16 for zsprArray, DIMENSION at least ((n*(n + 1))/2). Before entry,with uplo = 'U' or 'u', the array ap must contain the uppertriangular part of the symmetric matrix packed sequentially,column-by-column, so that ap(1) contains A(1,1), ap(2)and ap(3) contain A(1, 2) and A(2,2) respectively, andso on.Before entry, with uplo = 'L' or 'l', the array ap mustcontain the lower triangular part of the symmetric matrixpacked sequentially, column-by-column, so that ap(1)contains a(1,1), ap(2) and ap(3) contain a(2,1) anda(3,1) respectively, and so on.Note that the imaginary parts of the diagonal elements neednot be set, they are assumed to be zero, and on exit theyare set to zero.

Output Parameters


ap


1193


?symvComputes a matrix-vector product for a complexsymmetric matrix.

Syntax

call csymv( uplo, n, alpha, a, lda, x, incx, beta, y, incy )

call zsymv( uplo, n, alpha, a, lda, x, incx, beta, y, incy )

Description

These routines perform the matrix-vector operation defined as

y := alpha*a*x + beta*y,

where:

alpha and beta are complex scalars

x and y are n-element complex vectors

a is an n-by-n symmetric complex matrix.

These routines have their real equivalents in BLAS (see ?symv in Chapter 2).

Input Parameters


uplo

If uplo = 'U' or 'u', the upper triangular part of the arraya is to be referenced. If uplo = 'L' or 'l', the lowertriangular part of the array a is to be referenced.


n

COMPLEX for csymvalpha, betaCOMPLEX*16 for zsymvSpecify the scalars alpha and beta. When beta is suppliedas zero, then y need not be set on input.

COMPLEX for csymvaCOMPLEX*16 for zsymv

1194


Array, DIMENSION (lda, n). Before entry with uplo = 'U'or 'u', the leading n-by-n upper triangular part of the arraya must contain the upper triangular part of the symmetricmatrix and the strictly lower triangular part of a is notreferenced. Before entry with uplo = 'L' or 'l', theleading n-by-n lower triangular part of the array a mustcontain the lower triangular part of the symmetric matrixand the strictly upper triangular part of a is not referenced.

INTEGER. Specifies the first dimension of A as declared inthe calling (sub)program. The value of lda must be at leastmax(1,n).

lda

COMPLEX for csymvxCOMPLEX*16 for zsymvArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.


incx

COMPLEX for csymvyCOMPLEX*16 for zsymvArray, DIMENSION at least (1 + (n - 1)*abs(incy)).Before entry, the incremented array y must contain then-element vector y.


incy

Output Parameters


1195


?syrPerforms the symmetric rank-1 update of acomplex symmetric matrix.

Syntax

call csyr( uplo, n, alpha, x, incx, a, lda )

call zsyr( uplo, n, alpha, x, incx, a, lda )

Description

These routines perform the symmetric rank 1 operation defined as

a := alpha*x*x' + a,

where:

alpha is a complex scalar

x is an n-element complex vector

a is an n-by-n complex symmetric matrix.

These routines have their real equivalents in BLAS (see ?syr in Chapter 2).

Input Parameters


uplo

If uplo = 'U' or 'u', the upper triangular part of the arraya is to be referenced. If uplo = 'L' or 'l', the lowertriangular part of the array a is to be referenced.


n

COMPLEX for csyralphaCOMPLEX*16 for zsyrSpecifies the scalar alpha.

COMPLEX for csyrxCOMPLEX*16 for zsyrArray, DIMENSION at least (1 + (n - 1)*abs(incx)).Before entry, the incremented array x must contain then-element vector x.

1196



incx

COMPLEX for csyraCOMPLEX*16 for zsyrArray, DIMENSION (lda, n). Before entry with uplo ='U' or 'u', the leading n-by-n upper triangular part of thearray a must contain the upper triangular part of thesymmetric matrix and the strictly lower triangular part of ais not referenced.Before entry with uplo = 'L' or 'l', the leading n-by-nlower triangular part of the array a must contain the lowertriangular part of the symmetric matrix and the strictly uppertriangular part of a is not referenced.

INTEGER. Specifies the first dimension of a as declared inthe calling (sub)program. The value of lda must be at leastmax(1,n).

lda

Output Parameters


a


i?max1Finds the index of the vector element whose realpart has maximum absolute value.

Syntax

index = icmax1( n, cx, incx )

index = izmax1( n, cx, incx )

1197


Description

Given a complex vector cx, the i?max1 functions return the index of the vector element whosereal part has maximum absolute value. These functions are based on the BLAS functionsicamax/izamax, but using the absolute value of the real part. They are designed for use withclacon/zlacon.

Input Parameters

INTEGER. Specifies the number of elements in the vectorcx.

n

COMPLEX for icmax1cxCOMPLEX*16 for izmax1Array, DIMENSION at least (1+(n-1)*abs(incx)).Contains the input vector.

INTEGER. Specifies the spacing between successive elementsof cx.

incx

Output Parameters

INTEGER. Contains the index of the vector element whosereal part has maximum absolute value.

index

?sum1Forms the 1-norm of the complex vector using thetrue absolute value.

Syntax

res = scsum1( n, cx, incx )

res = dzsum1( n, cx, incx )

Description

Given a complex vector cx, scsum1/dzsum1 functions take the sum of the absolute values ofvector elements and return a single/DOUBLE PRECISION result, respectively. These functionsare based on scasum/dzasum from Level 1 BLAS, but use the true absolute value and weredesigned for use with clacon/zlacon.

1198


Input Parameters

INTEGER. Specifies the number of elements in the vectorcx.

n

COMPLEX for scsum1cxCOMPLEX*16 for dzsum1Array, DIMENSION at least (1+(n-1)*abs(incx)).Contains the input vector whose elements will be summed.

INTEGER. Specifies the spacing between successive elementsof cx (incx > 0).

incx

Output Parameters

REAL for scsum1resDOUBLE PRECISION for dzsum1Contains the sum of absolute values.

?gbtf2Computes the LU factorization of a general bandmatrix using the unblocked version of thealgorithm.

Syntax

call sgbtf2( m, n, kl, ku, ab, ldab, ipiv, info )

call dgbtf2( m, n, kl, ku, ab, ldab, ipiv, info )

call cgbtf2( m, n, kl, ku, ab, ldab, ipiv, info )

call zgbtf2( m, n, kl, ku, ab, ldab, ipiv, info )

Description

The routine forms the LU factorization of a general real/complex m-by-n band matrix A with klsub-diagonals and ku super-diagonals. The routine uses partial pivoting with row interchangesand implements the unblocked version of the algorithm, calling Level 2 BLAS. See also ?gbtrf.

Input Parameters


1199




A (kl ≥ 0).

kl


of A (ku ≥ 0).

ku

REAL for sgbtf2abDOUBLE PRECISION for dgbtf2COMPLEX for cgbtf2COMPLEX*16 for zgbtf2.Array, DIMENSION (ldab,*).The array ab contains the matrix A in band storage (seeMatrix Arguments).The second dimension of ab must be at least max(1, n).

INTEGER. The first dimension of the array ab.ldab

(ldab ≥ 2kl + ku +1)

Output Parameters

Overwritten by details of the factorization. The diagonal andkl + ku super-diagonals of U are stored in the first 1 + kl+ ku rows of ab. The multipliers used during the factorizationare stored in the next kl rows.

ab

INTEGER.ipivArray, DIMENSION at least max(1,min(m,n)).The pivot indices: row i was interchanged with row ipiv(i).

INTEGER. If info =0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i, uii is 0. The factorization has been completed,but U is exactly singular. Division by 0 will occur if you usethe factor U for solving a system of linear equations.

1200


?gebd2Reduces a general matrix to bidiagonal form usingan unblocked algorithm.

Syntax

call sgebd2( m, n, a, lda, d, e, tauq, taup, work, info )

call dgebd2( m, n, a, lda, d, e, tauq, taup, work, info )

call cgebd2( m, n, a, lda, d, e, tauq, taup, work, info )

call zgebd2( m, n, a, lda, d, e, tauq, taup, work, info )

Description

The routine reduces a general m-by-n matrix A to upper or lower bidiagonal form B by anorthogonal (unitary) transformation: Q'*A*P = B

If m ≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.

The routine does not form the matrices Q and P explicitly, but represents them as products of

elementary reflectors. if m ≥ n,

Q = H(1)H(2)...H(n) and P = G(1)G(2)...G(n-1)

if m < n,

Q = H(1)H(2)...H(m-1) and P = G(1)G(2)...G(m)

Each H(i) and G(i) has the form

H(i) = I - tauq*v*v' and G(i) = I - taup*u*u'

where tauq and taup are scalars (real for sgebd2/dgebd2, complex for cgebd2/zgebd2), andv and u are vectors (real for sgebd2/dgebd2, complex for cgebd2/zgebd2).

Input Parameters



REAL for sgebd2a, workDOUBLE PRECISION for dgebd2COMPLEX for cgebd2

1201

COMPLEX*16 for zgebd2.Arrays:a(lda,*) contains the m-by-n general matrix A to bereduced. The second dimension of a must be at least max(1,n).work(*) is a workspace array, the dimension of work mustbe at least max(1, m, n).


Output Parameters

if m ≥ n, the diagonal and first super-diagonal of a areoverwritten with the upper bidiagonal matrix B. Elementsbelow the diagonal, with the array tauq, represent the

a

orthogonal/unitary matrix Q as a product of elementaryreflectors, and elements above the first superdiagonal, withthe array taup, represent the orthogonal/unitary matrix pas a product of elementary reflectors.if m < n, the diagonal and first sub-diagonal of a areoverwritten by the lower bidiagonal matrix B. Elementsbelow the first subdiagonal, with the array tauq, representthe orthogonal/unitary matrix Q as a product of elementaryreflectors, and elements above the diagonal, with the arraytaup, represent the orthogonal/unitary matrix p as a productof elementary reflectors.

REAL for single-precision flavorsdDOUBLE PRECISION for double-precision flavors.Array, DIMENSION at least max(1, min(m, n)).Contains the diagonal elements of the bidiagonal matrix B:d(i) = a(i, i).

REAL for single-precision flavorseDOUBLE PRECISION for double-precision flavors. Array,DIMENSION at least max(1, min(m, n) - 1).Contains the off-diagonal elements of the bidiagonal matrixB:

if m ≥ n, e(i) = a(i, i+1) for i = 1,2,..., n-1;if m < n, e(i) = a(i+1, i) for i = 1,2,..., m-1.

1202

REAL for sgebd2tauq, taupDOUBLE PRECISION for dgebd2COMPLEX for cgebd2COMPLEX*16 for zgebd2.Arrays, DIMENSION at least max (1, min(m, n)).Contain scalar factors of the elementary reflectors whichrepresent orthogonal/unitary matrices Q and p, respectively.


?gehd2Reduces a general square matrix to upperHessenberg form using an unblocked algorithm.

Syntax

call sgehd2( n, ilo, ihi, a, lda, tau, work, info )

call dgehd2( n, ilo, ihi, a, lda, tau, work, info )

call cgehd2( n, ilo, ihi, a, lda, tau, work, info )

call zgehd2( n, ilo, ihi, a, lda, tau, work, info )

Description

The routine reduces a real/complex general matrix A to upper Hessenberg form H by anorthogonal or unitary similarity transformation Q'*A*Q = H.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product ofelementary reflectors.

Input Parameters

INTEGER The order of the matrix A (n ≥ 0).n

INTEGER. It is assumed that A is already upper triangularin rows and columns 1:ilo -1 and ihi+1:n.

ilo, ihi

If A has been output by ?gebal, then

1203


ilo and ihi must contain the values returned by thatroutine. Otherwise they should be set to ilo = 1 and ihi

= n. Constraint: 1 ≤ ilo ≤ ihi ≤ max(1, n).

REAL for sgehd2a, workDOUBLE PRECISION for dgehd2COMPLEX for cgehd2COMPLEX*16 for zgehd2.Arrays:a (lda,*) contains the n-by-n matrix A to be reduced. Thesecond dimension of a must be at least max(1, n).work (n) is a workspace array.


Output Parameters

On exit, the upper triangle and the first subdiagonal of Aare overwritten with the upper Hessenberg matrix H andthe elements below the first subdiagonal, with the arraytau, represent the orthogonal/unitary matrix Q as a productof elementary reflectors. See Application Notes below.

a

REAL for sgehd2tauDOUBLE PRECISION for dgehd2COMPLEX for cgehd2COMPLEX*16 for zgehd2.Array, DIMENSION at least max (1, n-1).Contains the scalar factors of elementary reflectors. SeeApplication Notes below.


Application Notes

The matrix Q is represented as a product of (ihi - ilo) elementary reflectors

Q = H(ilo) H(ilo +1) ... H(ihi -1)

Each H(i) has the form

H(i) = I - tau*v*v'

1204


where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0, v(i+1)= 1 and v(ihi+1:n) = 0.

On exit, v(i+2:ihi) is stored in a(i+2:ihi, i) and tau in tau(i).

The contents of a are illustrated by the following example, with n = 7, ilo = 2 and ihi =6:

where a denotes an element of the original matrix A, h denotes a modified element of the upperHessenberg matrix H, and vi denotes an element of the vector defining H(i).

?gelq2Computes the LQ factorization of a generalrectangular matrix using an unblocked algorithm.

Syntax

call sgelq2( m, n, a, lda, tau, work, info )

call dgelq2( m, n, a, lda, tau, work, info )

call cgelq2( m, n, a, lda, tau, work, info )

call zgelq2( m, n, a, lda, tau, work, info )

Description

The routine computes an LQ factorization of a real/complex m-by-n matrix A as A = L*Q.

1205


The routine does not form the matrix Q explicitly. Instead, Q is represented as a product ofmin(m, n) elementary reflectors :

Q = H(k) ... H(2) H(1) (or Q = H(k)' ... H(2)' H(1)' for complex flavors), where k= min(m, n)


H(i) = I - tau*v*v'

where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector with v(1:i-1)= 0 and v(i) = 1.

On exit, v(i+1:n) is stored in a(i, i+1:n).

Input Parameters



REAL for sgelq2a, workDOUBLE PRECISION for dgelq2COMPLEX for cgelq2COMPLEX*16 for zgelq2.Arrays: a(lda,*) contains the m-by-n matrix A. The seconddimension of a must be at least max(1, n).work(m) is a workspace array.


Output Parameters

Overwritten by the factorization data as follows:aon exit, the elements on and below the diagonal of the arraya contain the m-by-min(n,m) lower trapezoidal matrix L (L

is lower triangular if n ≥ m); the elements above thediagonal, with the array tau, represent theorthogonal/unitary matrix Q as a product of min(n,m)elementary reflectors.

REAL for sgelq2tauDOUBLE PRECISION for dgelq2COMPLEX for cgelq2

1206


COMPLEX*16 for zgelq2.Array, DIMENSION at least max(1, min(m, n)).Contains scalar factors of the elementary reflectors.


?geql2Computes the QL factorization of a generalrectangular matrix using an unblocked algorithm.

Syntax

call sgeql2( m, n, a, lda, tau, work, info )

call dgeql2( m, n, a, lda, tau, work, info )

call cgeql2( m, n, a, lda, tau, work, info )

call zgeql2( m, n, a, lda, tau, work, info )

Description

The routine computes a QL factorization of a real/complex m-by-n matrix A as A = Q*L.


Q = H(k) ... H(2) H(1), where k = min(m, n)


H(i) = I - tau*v*v'

where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector withv(m-k+i+1:m) = 0 and v(m-k+i) = 1.

On exit, v(1:m-k+i-1) is stored in a(1:m-k+i-1, n-k+i).

Input Parameters



1207


REAL for sgeql2a, workDOUBLE PRECISION for dgeql2COMPLEX for cgeql2COMPLEX*16 for zgeql2.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).work(m) is a workspace array.


Output Parameters


on exit, if m ≥ n, the lower triangle of the subarraya(m-n+1:m, 1:n) contains the n-by-n lower triangularmatrix L; if m < n, the elements on and below the (n-m)thsuperdiagonal contain the m-by-n lower trapezoidal matrixL; the remaining elements, with the array tau, representthe orthogonal/unitary matrix Q as a product of elementaryreflectors.

REAL for sgeql2tauDOUBLE PRECISION for dgeql2COMPLEX for cgeql2COMPLEX*16 for zgeql2.Array, DIMENSION at least max(1, min(m, n)).Contains scalar factors of the elementary reflectors.


1208

?geqr2Computes the QR factorization of a generalrectangular matrix using an unblocked algorithm.

Syntax

call sgeqr2( m, n, a, lda, tau, work, info )

call dgeqr2( m, n, a, lda, tau, work, info )

call cgeqr2( m, n, a, lda, tau, work, info )

call zgeqr2( m, n, a, lda, tau, work, info )

Description

The routine computes a QR factorization of a real/complex m-by-n matrix A as A = Q*R.


Q = H(1)H(2) ... H(k), where k = min(m, n)


H(i) = I - tau*v*v'

where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector with v(1:i-1)= 0 and v(i) = 1.

On exit, v(i+1:m) is stored in a(i+1:m, i).

Input Parameters



REAL for sgeqr2a, workDOUBLE PRECISION for dgeqr2COMPLEX for cgeqr2COMPLEX*16 for zgeqr2.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).

1209


work(n) is a workspace array.


Output Parameters

Overwritten by the factorization data as follows:aon exit, the elements on and above the diagonal of the arraya contain the min(n,m)-by-n upper trapezoidal matrix R (R

is upper triangular if m ≥ n); the elements below thediagonal, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors.

REAL for sgeqr2tauDOUBLE PRECISION for dgeqr2COMPLEX for cgeqr2COMPLEX*16 for zgeqr2.Array, DIMENSION at least max(1, min(m, n)).Contains scalar factors of the elementary reflectors.


?gerq2Computes the RQ factorization of a generalrectangular matrix using an unblocked algorithm.

Syntax

call sgerq2( m, n, a, lda, tau, work, info )

call dgerq2( m, n, a, lda, tau, work, info )

call cgerq2( m, n, a, lda, tau, work, info )

call zgerq2( m, n, a, lda, tau, work, info )

Description

The routine computes a RQ factorization of a real/complex m-by-n matrix A as A = R*Q.

1210



Q = H(1)H(2) ... H(k), where k = min(m, n)


H(i) = I - tau*v*v'

where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector withv(n-k+i+1:n) = 0 and v(n-k+i) = 1.

On exit, v(1:n-k+i-1) is stored in a(m-k+i, 1:n-k+i-1).

Input Parameters



REAL for sgerq2a, workDOUBLE PRECISION for dgerq2COMPLEX for cgerq2COMPLEX*16 for zgerq2.Arrays:a(lda,*) contains the m-by-n matrix A.The second dimension of a must be at least max(1, n).work(m) is a workspace array.


Output Parameters


on exit, if m ≤ n, the upper triangle of the subarray a(1:m,n-m+1:n ) contains the m-by-m upper triangular matrix R;if m > n, the elements on and above the (m-n)-thsubdiagonal contain the m-by-n upper trapezoidal matrix R;the remaining elements, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors.

REAL for sgerq2tauDOUBLE PRECISION for dgerq2

1211


COMPLEX for cgerq2COMPLEX*16 for zgerq2.Array, DIMENSION at least max(1, min(m, n)).Contains scalar factors of the elementary reflectors.


?gesc2Solves a system of linear equations using the LUfactorization with complete pivoting computed by?getc2.

Syntax

call sgesc2( n, a, lda, rhs, ipiv, jpiv, scale )

call dgesc2( n, a, lda, rhs, ipiv, jpiv, scale )

call cgesc2( n, a, lda, rhs, ipiv, jpiv, scale )

call zgesc2( n, a, lda, rhs, ipiv, jpiv, scale )

Description

This routine solves a system of linear equations

A*X = scale*RHS

with a general n-by-n matrix A using the LU factorization with complete pivoting computed by?getc2.

Input Parameters

INTEGER. The order of the matrix A.n

REAL for sgesc2a, rhsDOUBLE PRECISION for dgesc2COMPLEX for cgesc2COMPLEX*16 for zgesc2.Arrays:a(lda,*) contains the LU part of the factorization of then-by-n matrix A computed by ?getc2:

1212


A = P*L*U*Q.The second dimension of a must be at least max(1, n);rhs(n) contains on entry the right hand side vector for thesystem of equations.


INTEGER.ipivArray, DIMENSION at least max(1,n).

The pivot indices: for 1 ≤ i ≤ n, row i of the matrix hasbeen interchanged with row ipiv(i).

INTEGER.jpivArray, DIMENSION at least max(1,n).

The pivot indices: for 1 ≤ j ≤ n, column j of the matrixhas been interchanged with column jpiv(j).

Output Parameters

On exit, overwritten with the solution vector X.rhs

REAL for sgesc2/cgesc2scaleDOUBLE PRECISION for dgesc2/zgesc2

Contains the scale factor. scale is chosen in the range 0 ≤

scale ≤ 1 to prevent overflow in the solution.

?getc2Computes the LU factorization with completepivoting of the general n-by-n matrix.

Syntax

call sgetc2( n, a, lda, ipiv, jpiv, info )

call dgetc2( n, a, lda, ipiv, jpiv, info )

call cgetc2( n, a, lda, ipiv, jpiv, info )

call zgetc2( n, a, lda, ipiv, jpiv, info )

1213


Description

This routine computes an LU factorization with complete pivoting of the n-by-n matrix A. Thefactorization has the form A = P*L*U*Q, where p and Q are permutation matrices, L is lowertriangular with unit diagonal elements and U is upper triangular.

The LU factorization computed by this routine is used by ?latdf to compute a contribution tothe reciprocal Dif-estimate.

Input Parameters


REAL for sgetc2aDOUBLE PRECISION for dgetc2COMPLEX for cgetc2COMPLEX*16 for zgetc2.Array a(lda,*) contains the n-by-n matrix A to be factored.The second dimension of a must be at least max(1, n);


Output Parameters

On exit, the factors L and U from the factorization A =P*L*U*Q; the unit diagonal elements of L are not stored. IfU(k, k) appears to be less than smin, U(k, k) is given thevalue of smin, i.e., giving a nonsingular perturbed system.

a

INTEGER.ipivArray, DIMENSION at least max(1,n).

The pivot indices: for 1 ≤ i ≤ n , row i of the matrix hasbeen interchanged with row ipiv(i).

INTEGER.jpivArray, DIMENSION at least max(1,n).

The pivot indices: for 1 ≤ j ≤ n , column j of the matrixhas been interchanged with column jpiv(j).


1214


If info = k >0, U(k, k) is likely to produce overflow if wetry to solve for x in A*x = b. So U is perturbed to avoid theoverflow.

?getf2Computes the LU factorization of a general m-by-nmatrix using partial pivoting with row interchanges(unblocked algorithm).

Syntax

call sgetf2( m, n, a, lda, ipiv, info )

call dgetf2( m, n, a, lda, ipiv, info )

call cgetf2( m, n, a, lda, ipiv, info )

call zgetf2( m, n, a, lda, ipiv, info )

Description

The routine computes the LU factorization of a general m-by-n matrix A using partial pivotingwith row interchanges. The factorization has the form

A = P*L*U

where p is a permutation matrix, L is lower triangular with unit diagonal elements (lowertrapezoidal if m > n) and U is upper triangular (upper trapezoidal if m < n).

Input Parameters



REAL for sgetf2aDOUBLE PRECISION for dgetf2COMPLEX for cgetf2COMPLEX*16 for zgetf2.Array, DIMENSION (lda,*). Contains the matrix A to befactored. The second dimension of a must be at least max(1,n).


1215

Output Parameters

Overwritten by L and U. The unit diagonal elements of L arenot stored.

a

INTEGER.ipivArray, DIMENSION at least max(1,min(m,n)).

The pivot indices: for 1 ≤ i ≤ n , row i was interchangedwith row ipiv(i).

INTEGER. If info=0, the execution is successful.infoIf info = -i, the i-th parameter had an illegal value.If info = i >0, uii is 0. The factorization has beencompleted, but U is exactly singular. Division by 0 will occurif you use the factor U for solving a system of linearequations.

?gtts2Solves a system of linear equations with atridiagonal matrix using the LU factorizationcomputed by ?gttrf.

Syntax

call sgtts2( itrans, n, nrhs, dl, d, du, du2, ipiv, b, ldb )

call dgtts2( itrans, n, nrhs, dl, d, du, du2, ipiv, b, ldb )

call cgtts2( itrans, n, nrhs, dl, d, du, du2, ipiv, b, ldb )

call zgtts2( itrans, n, nrhs, dl, d, du, du2, ipiv, b, ldb )

Description

This routine solves for X one of the following systems of linear equations with multiple righthand sides:

A*X = B, AT*X = B, or AH*X = B (for complex matrices only), with a tridiagonal matrix Ausing the LU factorization computed by ?gttrf.

Input Parameters

INTEGER. Must be 0, 1, or 2.itrans

1216


Indicates the form of the equations being solved:If itrans = 0, then A*X = B (no transpose).If itrans = 1, then AT*X = B (transpose).If itrans = 2, then AH*X = B (conjugate transpose).


INTEGER. The number of right-hand sides, i.e., the number

of columns in B (nrhs ≥ 0).

nrhs

REAL for sgtts2dl,d,du,du2,bDOUBLE PRECISION for dgtts2COMPLEX for cgtts2COMPLEX*16 for zgtts2.Arrays: dl(n - 1), d(n ), du(n - 1), du2(n - 2),b(ldb,nrhs).The array dl contains the (n - 1) multipliers that define thematrix L from the LU factorization of A.The array d contains the n diagonal elements of the uppertriangular matrix U from the LU factorization of A.The array du contains the (n - 1) elements of the firstsuper-diagonal of U.The array du2 contains the (n - 2) elements of the secondsuper-diagonal of U.The array b contains the matrix B whose columns are theright-hand sides for the systems of equations.

INTEGER. The leading dimension of b; must be ldb ≥max(1, n).

ldb

INTEGER.ipivArray, DIMENSION (n).The pivot indices array, as returned by ?gttrf.

Output Parameters


1217


?isnanTests input for NaN.

Syntax

val = sisnan( sin )

val = disnan( din )

Description

This logical routine returns .TRUE. if its argument is NaN, and .FALSE. otherwise.

Input Parameters

REAL for sisnansinInput to test for NaN.

DOUBLE PRECISION for disnandinInput to test for NaN.

Output Parameters

Logical. Result of the test.val

?laisnanTests input for NaN.

Syntax

val = slaisnan( sin1, sin2 )

val = dlaisnan( din1, din2 )

Description

This logical routine checks for NaNs (NaN stands for 'Not A Number') by comparing its two

arguments for inequality. NaN is the only floating-point value where NaN ≠ NaN returns .TRUE.To check for NaNs, pass the same variable as both arguments.

This routine is not for general use. It exists solely to avoid over-optimization in ?isnan.

1218


Input Parameters

REAL for sisnansin1, sin2Two numbers to compare for inequality.

DOUBLE PRECISION for disnandin2, din2Two numbers to compare for inequality.

Output Parameters

Logical. Result of the comparison.val

?labrdReduces the first nb rows and columns of a generalmatrix to a bidiagonal form.

Syntax

call slabrd( m, n, nb, a, lda, d, e, tauq, taup, x, ldx, y, ldy )

call dlabrd( m, n, nb, a, lda, d, e, tauq, taup, x, ldx, y, ldy )

call clabrd( m, n, nb, a, lda, d, e, tauq, taup, x, ldx, y, ldy )

call zlabrd( m, n, nb, a, lda, d, e, tauq, taup, x, ldx, y, ldy )

Description

The routine reduces the first nb rows and columns of a general m-by-n matrix A to upper orlower bidiagonal form by an orthogonal/unitary transformation Q'*A*P, and returns the matricesX and Y which are needed to apply the transformation to the unreduced part of A.

if m ≥ n, A is reduced to upper bidiagonal form; if m < n, to lower bidiagonal form.

The matrices Q and P are represented as products of elementary reflectors: Q = H(1) H(2)... H(nb) and P = G(1) G(2) ... G(nb)

Each H(i) and G(i) has the form

H(i) = I - tauq*v*v' and G(i) = I - taup*u*u'

where tauq and taup are scalars, and v and u are vectors.

1219

The elements of the vectors v and u together form the m-by-nb matrix V and the nb-by-n matrixU' which are needed, with X and Y, to apply the transformation to the unreduced part of thematrix, using a block update of the form: A := A - V*Y' - X*U'.

This is an auxiliary routine called by ?gebrd.

Input Parameters



INTEGER. The number of leading rows and columns of A tobe reduced.

nb

REAL for slabrdaDOUBLE PRECISION for dlabrdCOMPLEX for clabrdCOMPLEX*16 for zlabrd.Array a(lda,*) contains the matrix A to be reduced. Thesecond dimension of a must be at least max(1, n).


INTEGER. The first dimension of the output array x; mustbeat least max(1, m).

ldx

INTEGER. The first dimension of the output array y; mustbeat least max(1, n).

ldy

Output Parameters

On exit, the first nb rows and columns of the matrix areoverwritten; the rest of the array is unchanged.

a

if m ≥ n, elements on and below the diagonal in the first nbcolumns, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors; and elements above the diagonal in the first nbrows, with the array taup, represent the orthogonal/unitarymatrix p as a product of elementary reflectors.if m < n, elements below the diagonal in the first nbcolumns, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementary

1220

reflectors, and elements on and above the diagonal in thefirst nb rows, with the array taup, represent theorthogonal/unitary matrix p as a product of elementaryreflectors.

REAL for single-precision flavorsd, eDOUBLE PRECISION for double-precision flavors. Arrays,DIMENSION (nb) each. The array d contains the diagonalelements of the first nb rows and columns of the reducedmatrix:d(i) = a(i,i).The array e contains the off-diagonal elements of the firstnb rows and columns of the reduced matrix.

REAL for slabrdtauq, taupDOUBLE PRECISION for dlabrdCOMPLEX for clabrdCOMPLEX*16 for zlabrd.Arrays, DIMENSION (nb) each. Contain scalar factors of theelementary reflectors which represent the orthogonal/unitarymatrices Q and P, respectively.

REAL for slabrdx, yDOUBLE PRECISION for dlabrdCOMPLEX for clabrdCOMPLEX*16 for zlabrd.Arrays, dimension x(ldx, nb), y(ldy, nb).The array x contains the m-by-nb matrix X required to updatethe unreduced part of A.The array y contains the n-by-nb matrix Y required to updatethe unreduced part of A.

Application Notes

if m ≥ n, then for the elementary reflectors H(i) and G(i),

v(1:i-1) = 0, v(i) = 1, and v(i:m) is stored on exit in a(i:m, i); u(1:i) = 0, u(i+1)= 1, and u(i+1:n) is stored on exit in a(i, i+1:n);

tauq is stored in tauq(i) and taup in taup(i).

if m < n,

1221

v(1:i) = 0, v(i+1) = 1, and v(i+1:m) is stored on exit in a(i+2:m, i) ; u(1:i-1) = 0,u(i) = 1, and u(i:n) is stored on exit in a(i, i+1:n); tauq is stored in tauq(i) and taupin taup(i).

The contents of a on exit are illustrated by the following examples with nb = 2:

where a denotes an element of the original matrix which is unchanged, vi denotes an elementof the vector defining H(i), and ui an element of the vector defining G(i).

?lacn2Estimates the 1-norm of a square matrix, usingreverse communication for evaluating matrix-vectorproducts.

Syntax

call slacn2( n, v, x, isgn, est, kase, isave )

call dlacn2( n, v, x, isgn, est, kase, isave )

call clacn2( n, v, x, est, kase, isave )

call zlacn2( n, v, x, est, kase, isave )

1222


Description

This routine estimates the 1-norm of a square, real or complex matrix A. Reverse communicationis used for evaluating matrix-vector products.

Input Parameters


REAL for slacn2v, xDOUBLE PRECISION for dlacn2COMPLEX for clacn2COMPLEX*16 for zlacn2.Arrays, DIMENSION (n) each.v is a workspace array.x is used as input after an intermediate return.

INTEGER.isgnWorkspace array, DIMENSION (n), used with real flavorsonly.

REAL for slacn2/clacn2estDOUBLE PRECISION for dlacn2/zlacn2On entry with kase set to 1 or 2, and isave(1) = 1, estmust be unchanged from the previous call to the routine.

INTEGER.kaseOn the initial call to the routine, kase must be set to 0.

INTEGER. Array, DIMENSION (3).isaveContains variables from the previous call to the routine.

Output Parameters

An estimate (a lower bound) for norm(A).est

On an intermediate return, kase is set to 1 or 2, indicatingwhether x should be overwritten by A*x or A'*x.

kase

On the final return, kase is set to 0.

On the final return, v = A*w, where est =norm(v)/norm(w) (w is not returned).

v

On an intermediate return, x should be overwritten byxA*x , if kase = 1,

1223


A'*x , if kase = 2,(A' is the conjugate transpose of A), and the routine mustbe re-called with all the other parameters unchanged.

This parameter is used to save variables between calls tothe routine.

isave

?laconEstimates the 1-norm of a square matrix, usingreverse communication for evaluating matrix-vectorproducts.

Syntax

call slacon( n, v, x, isgn, est, kase )

call dlacon( n, v, x, isgn, est, kase )

call clacon( n, v, x, est, kase )

call zlacon( n, v, x, est, kase )

Description

This routine estimates the 1-norm of a square, real/complex matrix A. Reverse communicationis used for evaluating matrix-vector products.

Input Parameters


REAL for slaconv, xDOUBLE PRECISION for dlaconCOMPLEX for claconCOMPLEX*16 for zlacon.Arrays, DIMENSION (n) each.v is a workspace array.x is used as input after an intermediate return.

INTEGER.isgnWorkspace array, DIMENSION (n), used with real flavorsonly.

REAL for slacon/claconest

1224


DOUBLE PRECISION for dlacon/zlaconAn estimate that with kase=1 or 2 should be unchangedfrom the previous call to ?lacon.

INTEGER.kaseOn the initial call to ?lacon, kase should be 0.

Output Parameters

REAL for slacon/claconestDOUBLE PRECISION for dlacon/zlaconAn estimate (a lower bound) for norm(A).

On an intermediate return, kase will be 1 or 2, indicatingwhether x should be overwritten by A*x or A'*x. On thefinal return from ?lacon, kase will again be 0.

kase

On the final return, v = A*w, where est =norm(v)/norm(w) (w is not returned).

v

On an intermediate return, x should be overwritten byxA*x , if kase = 1,A'*x , if kase = 2,(where for complex flavors A' is the conjugate transposeof A), and ?lacon must be re-called with all the otherparameters unchanged.

?lacpyCopies all or part of one two-dimensional array toanother.

Syntax

call slacpy( uplo, m, n, a, lda, b, ldb )

call dlacpy( uplo, m, n, a, lda, b, ldb )

call clacpy( uplo, m, n, a, lda, b, ldb )

call zlacpy( uplo, m, n, a, lda, b, ldb )

Description

This routine copies all or part of a two-dimensional matrix A to another matrix B.

1225


Input Parameters

CHARACTER*1.uploSpecifies the part of the matrix A to be copied to B.If uplo = 'U', the upper triangular part of A is copied.If uplo = 'L', the lower triangular part of A is copied.Otherwise, all of the matrix A is copied.



REAL for slacpyaDOUBLE PRECISION for dlacpyCOMPLEX for clacpyCOMPLEX*16 for zlacpy.Array a(lda,*), contains the m-by-n matrix A.The second dimension of a must be at least max(1,n).If uplo = 'U', only the upper triangle or trapezoid isaccessed; if uplo = 'L', only the lower triangle ortrapezoid is accessed.

INTEGER. The first dimension of a; lda ≥ max(1, m).lda

INTEGER. The first dimension of the output array b; ldb ≥max(1, m).

ldb

Output Parameters

REAL for slacpybDOUBLE PRECISION for dlacpyCOMPLEX for clacpyCOMPLEX*16 for zlacpy.Array b(ldb,*), contains the m-by-n matrix B.The second dimension of b must be at least max(1,n).On exit, B = A in the locations specified by uplo.

1226


?ladivPerforms complex division in real arithmetic,avoiding unnecessary overflow.

Syntax

call sladiv( a, b, c, d, p, q )

call dladiv( a, b, c, d, p, q )

res = cladiv( x, y )

res = zladiv( x, y )

Description

The routines sladiv/dladiv perform complex division in real arithmetic as

Complex functions cladiv/zladiv compute the result as

res = x/y,

where x and y are complex. The computation of x / y will not overflow on an intermediary stepunless the results overflows.

Input Parameters

REAL for sladiva, b, c, dDOUBLE PRECISION for dladivThe scalars a, b, c, and d in the above expression (for realflavors only).

COMPLEX for cladivx, yCOMPLEX*16 for zladivThe complex scalars x and y (for complex flavors only).

Output Parameters

REAL for sladivp, q

1227


DOUBLE PRECISION for dladivThe scalars p and q in the above expression (for real flavorsonly).

COMPLEX for cladivresDOUBLE COMPLEX for zladivContains the result of division x / y.

?lae2Computes the eigenvalues of a 2-by-2 symmetricmatrix.

Syntax

call slae2( a, b, c, rt1, rt2 )

call dlae2( a, b, c, rt1, rt2 )

Description

The routines sla2/dlae2 compute the eigenvalues of a 2-by-2 symmetric matrix

On return, rt1 is the eigenvalue of larger absolute value, and rt1 is the eigenvalue of smallerabsolute value.

Input Parameters

REAL for slae2a, b, cDOUBLE PRECISION for dlae2The elements a, b, and c of the 2-by-2 matrix above.

Output Parameters

REAL for slae2rt1, rt2DOUBLE PRECISION for dlae2

1228


The computed eigenvalues of larger and smaller absolutevalue, respectively.

Application Notes

rt1 is accurate to a few ulps barring over/underflow. rt2 may be inaccurate if there is massivecancellation in the determinant a*c-b*b; higher precision or correctly rounded or correctlytruncated arithmetic would be needed to compute rt2 accurately in all cases.

Overflow is possible only if rt1 is within a factor of 5 of overflow. Underflow is harmless if theinput data is 0 or exceeds

underflow_threshold / macheps.

?laebzComputes the number of eigenvalues of a realsymmetric tridiagonal matrix which are less thanor equal to a given value, and performs other tasksrequired by the routine ?stebz.

Syntax

call slaebz( ijob, nitmax, n, mmax, minp, nbmin, abstol, reltol, pivmin, d,e, e2, nval, ab, c, mout, nab, work, iwork, info )

call dlaebz( ijob, nitmax, n, mmax, minp, nbmin, abstol, reltol, pivmin, d,e, e2, nval, ab, c, mout, nab, work, iwork, info )

Description

The routine ?laebz contains the iteration loops which compute and use the function n(w), whichis the count of eigenvalues of a symmetric tridiagonal matrix T less than or equal to its argumentw. It performs a choice of two types of loops:

=1, followed byijob

=2: It takes as input a list of intervals and returns a list of sufficientlysmall intervals whose union contains the same eigenvalues as the unionof the original intervals. The input intervals are (ab(j,1),ab(j,2)],j=1,...,minp. The output interval (ab(j,1),ab(j,2)] will contain

eigenvalues nab(j,1)+1,...,nab(j,2), where 1 ≤ j ≤ mout.

ijob

1229


=3: It performs a binary search in each input interval(ab(j,1),ab(j,2)] for a point w(j) such that n(w(j))=nval(j),and uses c(j) as the starting point of the search. If such a w(j) is found,

ijob

then on output ab(j,1)=ab(j,2)=w. If no such w(j) is found, then onoutput (ab(j,1),ab(j,2)] will be a small interval containing the pointwhere n(w) jumps through nval(j), unless that point lies outside theinitial interval.

Note that the intervals are in all cases half-open intervals, that is, of the form (a,b], whichincludes b but not a .

To avoid underflow, the matrix should be scaled so that its largest element is no greater thanoverflow**(1/2) * overflow**(1/4) in absolute value. To assure the most accurate computationof small eigenvalues, the matrix should be scaled to be not much smaller than that, either.

NOTE. In general, the arguments are not checked for unreasonable values.

Input Parameters

INTEGER. Specifies what is to be done:ijob= 1: Compute nab for the initial intervals.= 2: Perform bisection iteration to find eigenvalues of T.= 3: Perform bisection iteration to invert n(w), i.e., to finda point which has a specified number of eigenvalues of T toits left. Other values will cause ?laebz to return withinfo=-1.

INTEGER. The maximum number of "levels" of bisection tobe performed, i.e., an interval of width W will not be madesmaller than 2**(-nitmax)*W. If not all intervals haveconverged after nitmax iterations, then info is set to thenumber of non-converged intervals.

nitmax

INTEGER. The dimension n of the tridiagonal matrix T. Itmust be at least 1.

n

INTEGER. The maximum number of intervals. If more thanmmax intervals are generated, then ?laebz will quit withinfo=mmax+1.

mmax

1230


INTEGER. The initial number of intervals. It may not begreater than mmax.

minp

INTEGER. The smallest number of intervals that should beprocessed using a vector loop. If zero, then only the scalarloop will be used.

nbmin

REAL for slaebzabstolDOUBLE PRECISION for dlaebz.The minimum (absolute) width of an interval. When aninterval is narrower than abstol, or than reltol times thelarger (in magnitude) endpoint, then it is considered to besufficiently small, i.e., converged. This must be at least zero.

REAL for slaebzreltolDOUBLE PRECISION for dlaebz.The minimum relative width of an interval. When an intervalis narrower than abstol, or than reltol times the larger(in magnitude) endpoint, then it is considered to besufficiently small, i.e., converged. Note: this should alwaysbe at least radix*machine epsilon.

REAL for slaebzpivminDOUBLE PRECISION for dlaebz.The minimum absolute value of a "pivot" in the Sturmsequence loop. This value must be at least (max|e(j)**2|*safe_min) and at least safe_min, wheresafe_min is at least the smallest number that can divideone without overflow.

REAL for slaebzd, e, e2DOUBLE PRECISION for dlaebz.Arrays, dimension (n) each. The array d contains thediagonal elements of the tridiagonal matrix T.The array e contains the off-diagonal elements of thetridiagonal matrix T in positions 1 through n-1. e(n)visarbitrary.The array e2 contains the squares of the off-diagonalelements of the tridiagonal matrix T. e2(n) is ignored.

INTEGER.nvalArray, dimension (minp).If ijob=1 or 2, not referenced.

1231


If ijob=3, the desired values of n(w).

REAL for slaebzabDOUBLE PRECISION for dlaebz.Array, dimension (mmax,2) The endpoints of the intervals.ab(j,1) is a(j), the left endpoint of the j-th interval, andab(j,2) is b(j), the right endpoint of the j-th interval.

REAL for slaebzcDOUBLE PRECISION for dlaebz.Array, dimension (mmax)If ijob=1, ignored.If ijob=2, workspace.If ijob=3, then on input c(j) should be initialized to thefirst search point in the binary search.

INTEGER.nabArray, dimension (mmax,2)If ijob=2, then on input, nab(i,j) should be set. It mustsatisfy the condition:

n(ab(i,1)) ≤ nab(i,1) ≤ nab(i,2) ≤ n(ab(i,2)),which means that in interval i only eigenvaluesnab(i,1)+1,...,nab(i,2) are considered. Usually,nab(i,j)=n(ab(i,j)), from a previous call to ?laebzwith ijob=1.If ijob=3, normally, nab should be set to some distinctivevalue(s) before ?laebz is called.

REAL for slaebzworkDOUBLE PRECISION for dlaebz.Workspace array, dimension (mmax).

INTEGER.iworkWorkspace array, dimension (mmax).

Output Parameters

The elements of nval will be reordered to correspond withthe intervals in ab. Thus, nval(j) on output will not, ingeneral be the same as nval(j) on input, but it willcorrespond with the interval (ab(j,1),ab(j,2)] on output.

nval

1232


The input intervals will, in general, be modified, split, andreordered by the calculation.

ab

INTEGER.moutIf ijob=1, the number of eigenvalues in the intervals.If ijob=2 or 3, the number of intervals output.If ijob=3, mout will equal minp.

If ijob=1, then on output nab(i,j) will be set to N(ab(i,j)).nabIf ijob=2, then on output, nab(i,j) will contain max(na(k,min(nb(k), N(ab(i,j)))), where k is the index of the inputinterval that the output interval (ab(j,1),ab(j,2)] came from,and na(k) and nb(k) are the input values of nab(k,1) andnab(k,2).If ijob=3, then on output, nab(i,j) contains N(ab(i,j)), unlessN(w) > nval(i) for all search points w, in which case nab(i,1)will not be modified, i.e., the output value will be the sameas the input value (modulo reorderings, see nval and ab),or unless N(w) < nval(i) for all search points w, in whichcase nab(i,2) will not be modified.

INTEGER.info

All intervals converged.0:The last info intervals did not converge.1--mmax:

More than mmax intervals were generated.mmax+1:

Application Notes

This routine is intended to be called only by other LAPACK routines, thus the interface is lessuser-friendly. It is intended for two purposes:

(a) finding eigenvalues. In this case, ?laebz should have one or more initial intervals set upin ab, and ?laebz should be called with ijob=1. This sets up nab, and also counts theeigenvalues. Intervals with no eigenvalues would usually be thrown out at this point. Also, ifnot all the eigenvalues in an interval i are desired, nab(i,1) can be increased or nab(i,2)decreased. For example, set nab(i,1)=nab(i,2)-1 to get the largest eigenvalue. ?laebz is thencalled with ijob=2 and mmax no smaller than the value of mout returned by the call with ijob=1.After this (ijob=2) call, eigenvalues nab(i,1)+1 through nab(i,2) are approximately ab(i,1) (orab(i,2)) to the tolerance specified by abstol and reltol.

1233

(b) finding an interval (a',b'] containing eigenvalues w(f),...,w(l). In this case, start with aGershgorin interval (a,b). Set up ab to contain 2 search intervals, both initially (a,b). One nvalelement should contain f-1 and the other should contain l, while c should contain a and b,respectively. nab(i,1) should be -1 and nab(i,2) should be n+1, to flag an error if the desiredinterval does not lie in (a,b). ?laebz is then called with ijob=3. On exit, if w(f-1) < w(f),then one of the intervals -- j -- will have ab(j,1)=ab(j,2) and nab(j,1)=nab(j,2)=f-1,

while if, to the specified tolerance, w(f-k)=...=w(f+r), k > 0 and r ≥ 0, then the intervalwill have n(ab(j,1))=nab(j,1)=f-k and n(ab(j,2))=nab(j,2)=f+r. The cases w(l) <w(l+1) and w(l-r)=...=w(l+k) are handled similarly.

?laed0Used by ?stedc. Computes all eigenvalues andcorresponding eigenvectors of an unreducedsymmetric tridiagonal matrix using the divide andconquer method.

Syntax

call slaed0( icompq, qsiz, n, d, e, q, ldq, qstore, ldqs, work, iwork, info)

call dlaed0( icompq, qsiz, n, d, e, q, ldq, qstore, ldqs, work, iwork, info)

call claed0( qsiz, n, d, e, q, ldq, qstore, ldqs, rwork, iwork, info )

call zlaed0( qsiz, n, d, e, q, ldq, qstore, ldqs, rwork, iwork, info )

Description

Real flavors of this routine compute all eigenvalues and (optionally) corresponding eigenvectorsof a symmetric tridiagonal matrix using the divide and conquer method.

Complex flavors claed0/zlaed0 compute all eigenvalues of a symmetric tridiagonal matrixwhich is one diagonal block of those from reducing a dense or band Hermitian matrix andcorresponding eigenvectors of the dense or band matrix.

Input Parameters

INTEGER. Used with real flavors only.icompqIf icompq = 0, compute eigenvalues only.

1234

If icompq = 1, compute eigenvectors of original densesymmetric matrix also.On entry, the array q must contain the orthogonal matrixused to reduce the original matrix to tridiagonal form.If icompq = 2, compute eigenvalues and eigenvectors ofthe tridiagonal matrix.

INTEGER.qsizThe dimension of the orthogonal/unitary matrix used to

reduce the full matrix to tridiagonal form; qsiz ≥ n (for

real flavors, qsiz ≥ n if icompq = 1).

INTEGER. The dimension of the symmetric tridiagonal matrix

(n ≥ 0).

n

REAL for single-precision flavorsd, e, rworkDOUBLE PRECISION for double-precision flavors. Arrays:d(*) contains the main diagonal of the tridiagonal matrix.The dimension of d must be at least max(1, n).e(*) contains the off-diagonal elements of the tridiagonalmatrix. The dimension of e must be at least max(1, n-1).rwork(*) is a workspace array used in complex flavors only.The dimension of rwork must be at least (1+3n+2nlg(n)+3n2), where lg(n) = smallest integer k such

that 2k ≥ n.

REAL for slaed0q, qstoreDOUBLE PRECISION for dlaed0COMPLEX for claed0COMPLEX*16 for zlaed0.Arrays: q(ldq, *), qstore(ldqs, *). The second dimensionof these arrays must be at least max(1, n).For real flavors:If icompq = 0, array q is not referenced.If icompq = 1, on entry, q is a subset of the columns ofthe orthogonal matrix used to reduce the full matrix totridiagonal form corresponding to the subset of the fullmatrix which is being decomposed at this time.

1235


If icompq = 2, on entry, q will be the identity matrix. Thearray qstore is a workspace array referenced only whenicompq = 1. Used to store parts of the eigenvector matrixwhen the updating matrix multiplies take place.For complex flavors:On entry, q must contain an qsiz-by-n matrix whosecolumns are unitarily orthonormal. It is a part of the unitarymatrix that reduces the full dense Hermitian matrix to a(reducible) symmetric tridiagonal matrix. The array qstoreis a workspace array used to store parts of the eigenvectormatrix when the updating matrix multiplies take place.

INTEGER. The first dimension of the array q; ldq ≥ max(1,n).

ldq

INTEGER. The first dimension of the array qstore; ldqs ≥max(1, n).

ldqs

REAL for slaed0workDOUBLE PRECISION for dlaed0.Workspace array, used in real flavors only.If icompq = 0 or 1, the dimension of work must be atleast (1 +3n+2nlg(n)+2n2), where lg(n) = smallest

integer k such that 2k ≥ n.If icompq = 2, the dimension of work must be at least(4n+n2).

INTEGER.iworkWorkspace array.For real flavors, if icompq = 0 or 1, and for complex flavors,the dimension of iwork must be at least (6+6n+5nlg(n)).For real flavors, if icompq = 2, the dimension of iworkmust be at least (3+5n).

Output Parameters

On exit, contains eigenvalues in ascending order.d

On exit, the array has been destroyed.e

If icompq = 2, on exit, q contains the eigenvectors of thetridiagonal matrix.

q

1236


INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = i >0, the algorithm failed to compute aneigenvalue while working on the submatrix lying in rowsand columns i/(n+1) through mod(i, n+1).

?laed1Used by sstedc/dstedc. Computes the updatedeigensystem of a diagonal matrix after modificationby a rank-one symmetric matrix. Used when theoriginal matrix is tridiagonal.

Syntax

call slaed1( n, d, q, ldq, indxq, rho, cutpnt, work, iwork, info )

call dlaed1( n, d, q, ldq, indxq, rho, cutpnt, work, iwork, info )

Description

The routine ?laed1 computes the updated eigensystem of a diagonal matrix after modificationby a rank-one symmetric matrix. This routine is used only for the eigenproblem which requiresall eigenvalues and eigenvectors of a tridiagonal matrix. ?laed7 handles the case in whicheigenvalues only or eigenvalues and eigenvectors of a full symmetric matrix (which was reducedto tridiagonal form) are desired.

T = Q(in)(D(in)+ rho*Z*Z')*Q'(in) = Q(out)*D(out)*Q'(out)

where Z = Q'u, u is a vector of length n with ones in the cutpnt and (cutpnt+1) -th elementsand zeros elsewhere. The eigenvectors of the original matrix are stored in Q, and the eigenvaluesare in D. The algorithm consists of three stages:

The first stage consists of deflating the size of the problem when there are multiple eigenvaluesor if there is a zero in the z vector. For each such occurrence the dimension of the secularequation problem is reduced by one. This stage is performed by the routine ?laed2.

The second stage consists of calculating the updated eigenvalues. This is done by finding theroots of the secular equation via the routine ?laed4 (as called by ?laed3). This routine alsocalculates the eigenvectors of the current problem.

1237


The final stage consists of computing the updated eigenvectors directly using the updatedeigenvalues. The eigenvectors for the current problem are multiplied with the eigenvectorsfrom the overall problem.

Input Parameters


(n ≥ 0).

n

REAL for slaed1d, q, workDOUBLE PRECISION for dlaed1.Arrays:d(*) contains the eigenvalues of the rank-1-perturbedmatrix. The dimension of d must be at least max(1, n).q(ldq, *) contains the eigenvectors of therank-1-perturbed matrix. The second dimension of q mustbe at least max(1, n).work(*) is a workspace array, dimension at least (4n+n2).


ldq

INTEGER. Array, dimension (n).indxqOn entry, the permutation which separately sorts the twosubproblems in d into ascending order.

REAL for slaed1rhoDOUBLE PRECISION for dlaed1.The subdiagonal entry used to create the rank-1modification.

INTEGER.cutpntThe location of the last eigenvalue in the leading sub-matrix.

min(1,n) ≤ cutpnt ≤ n/2.

INTEGER.iworkWorkspace array, dimension (4n).

Output Parameters

On exit, contains the eigenvalues of the repaired matrix.d

On exit, q contains the eigenvectors of the repairedtridiagonal matrix.

q

1238


On exit, contains the permutation which will reintegrate thesubproblems back into sorted order, that is, d( indxq(i= 1, n )) will be in ascending order.

indxq

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value. Ifinfo = 1, an eigenvalue did not converge.

?laed2Used by sstedc/dstedc. Merges eigenvalues anddeflates secular equation. Used when the originalmatrix is tridiagonal.

Syntax

call slaed2( k, n, n1, d, q, ldq, indxq, rho, z, dlamda, w, q2, indx, indxc,indxp, coltyp, info )

call dlaed2( k, n, n1, d, q, ldq, indxq, rho, z, dlamda, w, q2, indx, indxc,indxp, coltyp, info )

Description

The routine ?laed2 merges the two sets of eigenvalues together into a single sorted set. Thenit tries to deflate the size of the problem. There are two ways in which deflation can occur:when two or more eigenvalues are close together or if there is a tiny entry in the z vector. Foreach such occurrence the order of the related secular equation problem is reduced by one.

Input Parameters

INTEGER. The number of non-deflated eigenvalues, and the

order of the related secular equation (0 ≤ k ≤ n).

k


(n ≥ 0).

n

INTEGER. The location of the last eigenvalue in the leading

sub-matrix; min(1,n) ≤ n1 ≤ n/2.

n1

REAL for slaed2d, q, zDOUBLE PRECISION for dlaed2.

1239


Arrays:d(*) contains the eigenvalues of the two submatrices to becombined. The dimension of d must be at least max(1, n).q(ldq, *) contains the eigenvectors of the two submatricesin the two square blocks with corners at (1,1), (n1,n1) and(n1+1,n1+1), (n,n). The second dimension of q must be atleast max(1, n).z(*) contains the updating vector (the last row of the firstsub-eigenvector matrix and the first row of the secondsub-eigenvector matrix).


ldq

INTEGER. Array, dimension (n).indxqOn entry, the permutation which separately sorts the twosubproblems in d into ascending order. Note that elementsin the second half of this permutation must first have n1added to their values.

REAL for slaed2rhoDOUBLE PRECISION for dlaed2.On entry, the off-diagonal element associated with therank-1 cut which originally split the two submatrices whichare now being recombined.

INTEGER.indx, indxpWorkspace arrays, dimension (n) each. Array indx containsthe permutation used to sort the contents of dlamda intoascending order.Array indxp contains the permutation used to place deflatedvalues of d at the end of the array.indxp(1:k) points to the nondeflated d-values andindxp(k+1:n) points to the deflated eigenvalues.

INTEGER.coltypWorkspace array, dimension (n).During execution, a label which will indicate which of thefollowing types a column in the q2 matrix is:1 : non-zero in the upper half only;2 : dense;3 : non-zero in the lower half only;

1240


4 : deflated.

Output Parameters

On exit, d contains the trailing (n-k) updated eigenvalues(those which were deflated) sorted into increasing order.

d

On exit, q contains the trailing (n-k) updated eigenvectors(those which were deflated) in its last n-k columns.

q

Destroyed on exit.indxq

On exit, rho has been modified to the value required by?laed3.

rho

REAL for slaed2dlamda, w, q2DOUBLE PRECISION for dlaed2.Arrays: dlamda(n), w(n), q2(n12+(n-n1)2).The array dlamda contains a copy of the first k eigenvalueswhich is used by ?laed3 to form the secular equation.The array w contains the first k values of the finaldeflation-altered z-vector which is passed to ?laed3.The array q2 contains a copy of the first k eigenvectorswhich is used by ?laed3 in a matrix multiply (sgemm/dgemm)to solve for the new eigenvectors.

INTEGER. Array, dimension (n).indxcThe permutation used to arrange the columns of the deflatedq matrix into three groups: the first group contains non-zeroelements only at and above n1, the second containsnon-zero elements only below n1, and the third is dense.

On exit, coltyp(i) is the number of columns of type i, fori=1 to 4 only (see the definition of types in the descriptionof coltyp in Input Parameters).

coltyp


1241


?laed3Used by sstedc/dstedc. Finds the roots of thesecular equation and updates the eigenvectors.Used when the original matrix is tridiagonal.

Syntax

call slaed3( k, n, n1, d, q, ldq, rho, dlamda, q2, indx, ctot, w, s, info )

call dlaed3( k, n, n1, d, q, ldq, rho, dlamda, q2, indx, ctot, w, s, info )

Description

The routine ?laed3 finds the roots of the secular equation, as defined by the values in d, w,and rho, between 1 and k.

It makes the appropriate calls to ?laed4 and then updates the eigenvectors by multiplying thematrix of eigenvectors of the pair of eigensystems being combined by the matrix of eigenvectorsof the k-by-k system which is solved here.

This code makes very mild assumptions about floating point arithmetic. It will work on machineswith a guard digit in add/subtract, or on those binary machines without guard digits whichsubtract like the Cray X-MP, Cray Y-MP, Cray C-90, or Cray-2. It could conceivably fail onhexadecimal or decimal machines without guard digits, but none are known.

Input Parameters

INTEGER. The number of terms in the rational function to

be solved by ?laed4 (k ≥ 0).

k

INTEGER. The number of rows and columns in the q matrix.

n ≥ k (deflation may result in n >k).

n


sub-matrix; min(1,n) ≤ n1 ≤ n/2.

n1

REAL for slaed3qDOUBLE PRECISION for dlaed3.Array q(ldq, *). The second dimension of q must be atleast max(1, n).Initially, the first k columns of this array are used asworkspace.

1242



ldq

REAL for slaed3rhoDOUBLE PRECISION for dlaed3.The value of the parameter in the rank one update equation.

rho ≥ 0 required.

REAL for slaed3dlamda, q2, wDOUBLE PRECISION for dlaed3.Arrays: dlamda(k), q2(ldq2, *), w(k).The first k elements of the array dlamda contain the oldroots of the deflated updating problem. These are the polesof the secular equation.The first k columns of the array q2 contain the non-deflatedeigenvectors for the split problem. The second dimensionof q2 must be at least max(1, n).The first k elements of the array w contain the componentsof the deflation-adjusted updating vector.

INTEGER. Array, dimension (n).indxThe permutation used to arrange the columns of the deflatedq matrix into three groups (see ?laed2).The rows of the eigenvectors found by ?laed4 must belikewise permuted before the matrix multiply can take place.

INTEGER. Array, dimension (4).ctotA count of the total number of the various types of columnsin q, as described in indx. The fourth column type is anycolumn which has been deflated.

REAL for slaed3sDOUBLE PRECISION for dlaed3.Workspace array, dimension (n1+1)*k .Will contain the eigenvectors of the repaired matrix whichwill be multiplied by the previously accumulated eigenvectorsto update the system.

Output Parameters

REAL for slaed3dDOUBLE PRECISION for dlaed3.

1243


Array, dimension at least max(1, n).

d(i) contains the updated eigenvalues for 1 ≤ i ≤ k.

On exit, the columns 1 to k of q contain the updatedeigenvectors.

q

May be changed on output by having lowest order bit setto zero on Cray X-MP, Cray Y-MP, Cray-2, or Cray C-90, asdescribed above.

dlamda

Destroyed on exit.w

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, an eigenvalue did not converge.

?laed4Used by sstedc/dstedc. Finds a single root of thesecular equation.

Syntax

call slaed4( n, i, d, z, delta, rho, dlam, info )

call dlaed4( n, i, d, z, delta, rho, dlam, info )

Description

This subroutine computes the i-th updated eigenvalue of a symmetric rank-one modificationto a diagonal matrix whose elements are given in the array d, and that

D(i) < D(j) for i < j

and that rho > 0. This is arranged by the calling routine, and is no loss in generality. Therank-one modified system is thus

diag(D) + rho*Z * transpose(Z).

where we assume the Euclidean norm of Z is 1.

The method consists of approximating the rational functions in the secular equation by simplerinterpolating rational functions.

1244

Input Parameters

INTEGER. The length of all arrays.n

INTEGER. The index of the eigenvalue to be computed; 1

≤ i ≤ n.

i

REAL for slaed4d, zDOUBLE PRECISION for dlaed4Arrays, dimension (n) each. The array d contains the originaleigenvalues. It is assumed that they are in order, d(i) <d(j) for i < j.The array z contains the components of the updating vectorZ.

REAL for slaed4rhoDOUBLE PRECISION for dlaed4The scalar in the symmetric updating formula.

Output Parameters

REAL for slaed4deltaDOUBLE PRECISION for dlaed4Array, dimension (n).

If n ≠ 1, delta contains (d(j) - lambda_i) in its j-thcomponent. If n = 1, then delta(1) = 1. The vector deltacontains the information necessary to construct theeigenvectors.

REAL for slaed4dlamDOUBLE PRECISION for dlaed4The computed lambda_i, the i-th updated eigenvalue.

INTEGER.infoIf info = 0, the execution is successful.If info = 1, the updating process failed.

1245

?laed5Used by sstedc/dstedc. Solves the 2-by-2 secularequation.

Syntax

call slaed5( i, d, z, delta, rho, dlam )

call dlaed5( i, d, z, delta, rho, dlam )

Description

This subroutine computes the i-th eigenvalue of a symmetric rank-one modification of a 2-by-2diagonal matrix

diag(D) + rho*Z * transpose(Z).

The diagonal elements in the array D are assumed to satisfy

D(i) < D(j) for i < j .

We also assume rho > 0 and that the Euclidean norm of the vector Z is one.

Input Parameters

INTEGER. The index of the eigenvalue to be computed;i

1 ≤ i ≤ 2.

REAL for slaed5d, zDOUBLE PRECISION for dlaed5Arrays, dimension (2) each. The array d contains the originaleigenvalues. It is assumed that d(1) < d(2).The array z contains the components of the updating vector.

REAL for slaed5rhoDOUBLE PRECISION for dlaed5The scalar in the symmetric updating formula.

Output Parameters

REAL for slaed5deltaDOUBLE PRECISION for dlaed5Array, dimension (2).

1246

The vector delta contains the information necessary toconstruct the eigenvectors.

REAL for slaed5dlamDOUBLE PRECISION for dlaed5The computed lambda_i, the i-th updated eigenvalue.

?laed6Used by sstedc/dstedc. Computes one Newtonstep in solution of the secular equation.

Syntax

call slaed6( kniter, orgati, rho, d, z, finit, tau, info )

call dlaed6( kniter, orgati, rho, d, z, finit, tau, info )

Description

This routine computes the positive or negative root (closest to the origin) of

It is assumed that if orgati = .TRUE. the root is between d(2) and d(3);otherwise it is betweend(1) and d(2) This routine is called by ?laed4 when necessary. In most cases, the root soughtis the smallest in magnitude, though it might not be in some extremely rare situations.

Input Parameters

INTEGER.kniterRefer to ?laed4 for its significance.

LOGICAL.orgatiIf orgati = .TRUE., the needed root is between d(2) andd(3); otherwise it is between d(1) and d(2). See ?laed4for further details.

REAL for slaed6rhoDOUBLE PRECISION for dlaed6

1247


Refer to the equation for f(x) above.

REAL for slaed6d, zDOUBLE PRECISION for dlaed6Arrays, dimension (3) each.The array d satisfies d(1) < d(2) < d(3).Each of the elements in the array z must be positive.

REAL for slaed6finitDOUBLE PRECISION for dlaed6The value of f(x) at 0. It is more accurate than the oneevaluated inside this routine (if someone wants to do so).

Output Parameters

REAL for slaed6tauDOUBLE PRECISION for dlaed6The root of the equation for f(x).

INTEGER.infoIf info = 0, the execution is successful.If info = 1, failure to converge.

1248

?laed7Used by ?stedc. Computes the updatedeigensystem of a diagonal matrix after modificationby a rank-one symmetric matrix. Used when theoriginal matrix is dense.

Syntax

call slaed7( icompq, n, qsiz, tlvls, curlvl, curpbm, d, q, ldq, indxq, rho,cutpnt, qstore, qptr, prmptr, perm, givptr, givcol, givnum, work, iwork, info)

call dlaed7( icompq, n, qsiz, tlvls, curlvl, curpbm, d, q, ldq, indxq, rho,cutpnt, qstore, qptr, prmptr, perm, givptr, givcol, givnum, work, iwork, info)

call claed7( n, cutpnt, qsiz, tlvls, curlvl, curpbm, d, q, ldq, rho, indxq,qstore, qptr, prmptr, perm, givptr, givcol, givnum, work, rwork, iwork, info)

call zlaed7( n, cutpnt, qsiz, tlvls, curlvl, curpbm, d, q, ldq, rho, indxq,qstore, qptr, prmptr, perm, givptr, givcol, givnum, work, rwork, iwork, info)

Description

The routine ?laed7 computes the updated eigensystem of a diagonal matrix after modificationby a rank-one symmetric matrix. This routine is used only for the eigenproblem which requiresall eigenvalues and optionally eigenvectors of a dense symmetric/Hermitian matrix that hasbeen reduced to tridiagonal form. For real flavors, slaed1/dlaed1 handles the case in whichall eigenvalues and eigenvectors of a symmetric tridiagonal matrix are desired.

T = Q(in)*(D(in)+rho*Z*Z')*Q'(in) = Q(out)*D(out)*Q'(out)

where Z = Q'*u, u is a vector of length n with ones in the cutpnt and (cutpnt + 1) -thelements and zeros elsewhere. The eigenvectors of the original matrix are stored in Q, and theeigenvalues are in D. The algorithm consists of three stages:

The first stage consists of deflating the size of the problem when there are multiple eigenvaluesor if there is a zero in the z vector. For each such occurrence the dimension of the secularequation problem is reduced by one. This stage is performed by the routine slaed8/dlaed8(for real flavors) or by the routine slaed2/dlaed2 (for complex flavors).

1249


The second stage consists of calculating the updated eigenvalues. This is done by finding theroots of the secular equation via the routine ?laed4 (as called by ?laed9 or ?laed3). Thisroutine also calculates the eigenvectors of the current problem.

The final stage consists of computing the updated eigenvectors directly using the updatedeigenvalues. The eigenvectors for the current problem are multiplied with the eigenvectorsfrom the overall problem.

Input Parameters

INTEGER. Used with real flavors only.icompqIf icompq = 0, compute eigenvalues only.If icompq = 1, compute eigenvectors of original densesymmetric matrix also. On entry, the array q must containthe orthogonal matrix used to reduce the original matrix totridiagonal form.


(n ≥ 0).

n


sub-matrix. min(1,n) ≤ cutpnt ≤ n .

cutpnt

INTEGER.qsizThe dimension of the orthogonal/unitary matrix used to



INTEGER. The total number of merging levels in the overalldivide and conquer tree.

tlvls

INTEGER. The current level in the overall merge routine, 0

≤ curlvl ≤ tlvls .

curlvl

INTEGER. The current problem in the current level in theoverall merge routine (counting from upper left to lowerright).

curpbm

REAL for slaed7/claed7dDOUBLE PRECISION for dlaed7/zlaed7.Array, dimension at least max(1, n).Array d(*) contains the eigenvalues of the rank-1-perturbedmatrix.

1250


REAL for slaed7q, workDOUBLE PRECISION for dlaed7COMPLEX for claed7COMPLEX*16 for zlaed7.Arrays:q(ldq, *) contains the the eigenvectors of therank-1-perturbed matrix. The second dimension of q mustbe at least max(1, n).work(*) is a workspace array, dimension at least(3n+qsiz*n) for real flavors and at least (qsiz*n) forcomplex flavors.


ldq

REAL for slaed7 /claed7rhoDOUBLE PRECISION for dlaed7/zlaed7.The subdiagonal element used to create the rank-1modification.

REAL for slaed7/claed7qstoreDOUBLE PRECISION for dlaed7/zlaed7.Array, dimension (n2+1). Serves also as output parameter.Stores eigenvectors of submatrices encountered duringdivide and conquer, packed together. qptr points tobeginning of the submatrices.

INTEGER. Array, dimension (n+2). Serves also as outputparameter. List of indices pointing to beginning ofsubmatrices stored in qstore. The submatrices arenumbered starting at the bottom left of the divide andconquer tree, from left to right and bottom to top.

qptr

INTEGER. Arrays, dimension (n lgn ) each.prmptr, perm, givptrThe array prmptr(*) contains a list of pointers which indicatewhere in perm a level's permutation is stored. prmptr(i+1)- prmptr(i) indicates the size of the permutation and alsothe size of the full, non-deflated problem.The array perm(*) contains the permutations (from deflationand sorting) to be applied to each eigenblock.

1251


The array givptr(*) contains a list of pointers which indicatewhere in givcol a level's Givens rotations are stored.givptr(i+1) - givptr(i) indicates the number of Givensrotations.

INTEGER. Array, dimension (2, n lgn ).givcolEach pair of numbers indicates a pair of columns to takeplace in a Givens rotation.

REAL for slaed7/claed7givnumDOUBLE PRECISION for dlaed7/zlaed7.Array, dimension (2, n lgn).Each number indicates the S value to be used in thecorresponding Givens rotation.

INTEGER.iworkWorkspace array, dimension (4n ).

REAL for claed7rworkDOUBLE PRECISION for zlaed7.Workspace array, dimension (3n+2qsiz*n). Used in complexflavors only.

Output Parameters

On exit, contains the eigenvalues of the repaired matrix.d

On exit, q contains the eigenvectors of the repairedtridiagonal matrix.

q

INTEGER. Array, dimension (n).indxqContains the permutation which will reintegrate thesubproblems back into sorted order, that is,d( indxq(i = 1, n)) will be in ascending order.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value.If info = 1, an eigenvalue did not converge.

1252


?laed8Used by ?stedc. Merges eigenvalues and deflatessecular equation. Used when the original matrix isdense.

Syntax

call slaed8( icompq, k, n, qsiz, d, q, ldq, indxq, rho, cutpnt, z, dlamda,q2, ldq2, w, perm, givptr, givcol, givnum, indxp, indx, info )

call dlaed8( icompq, k, n, qsiz, d, q, ldq, indxq, rho, cutpnt, z, dlamda,q2, ldq2, w, perm, givptr, givcol, givnum, indxp, indx, info )

call claed8( k, n, qsiz, q, ldq, d, rho, cutpnt, z, dlamda, q2, ldq2, w, indxp,indx, indxq, perm, givptr, givcol, givnum, info )

call zlaed8( k, n, qsiz, q, ldq, d, rho, cutpnt, z, dlamda, q2, ldq2, w, indxp,indx, indxq, perm, givptr, givcol, givnum, info )

Description

This routine merges the two sets of eigenvalues together into a single sorted set. Then it triesto deflate the size of the problem. There are two ways in which deflation can occur: when twoor more eigenvalues are close together or if there is a tiny element in the z vector. For eachsuch occurrence the order of the related secular equation problem is reduced by one.

Input Parameters

INTEGER. Used with real flavors only.icompqIf icompq = 0, compute eigenvalues only.If icompq = 1, compute eigenvectors of original densesymmetric matrix also.On entry, the array q must contain the orthogonal matrixused to reduce the original matrix to tridiagonal form.


(n ≥ 0).

n


sub-matrix. min(1,n) ≤ cutpnt ≤ n .

cutpnt

INTEGER.qsiz

1253


The dimension of the orthogonal/unitary matrix used to



REAL for slaed8/claed8d, zDOUBLE PRECISION for dlaed8/zlaed8.Arrays, dimension at least max(1, n) each. The array d(*)contains the eigenvalues of the two submatrices to becombined.On entry, z(*) contains the updating vector (the last rowof the first sub-eigenvector matrix and the first row of thesecond sub-eigenvector matrix). The contents of z aredestroyed by the updating process.

REAL for slaed8qDOUBLE PRECISION for dlaed8COMPLEX for claed8COMPLEX*16 for zlaed8.Arrayq(ldq, *). The second dimension of q must be at leastmax(1, n). On entry, q contains the eigenvectors of thepartially solved system which has been previously updatedin matrix multiplies with other partially solved eigensystems.For real flavors, If icompq = 0, q is not referenced.


ldq

INTEGER. The first dimension of the output array q2; ldq2

≥ max(1, n).

ldq2

INTEGER. Array, dimension (n).indxqThe permutation which separately sorts the twosub-problems in d into ascending order. Note that elementsin the second half of this permutation must first have cutpntadded to their values in order to be accurate.

REAL for slaed8/claed8rhoDOUBLE PRECISION for dlaed8/zlaed8.On entry, the off-diagonal element associated with therank-1 cut which originally split the two submatrices whichare now being recombined.

1254


Output Parameters

INTEGER. The number of non-deflated eigenvalues, and theorder of the related secular equation.

k

On exit, contains the trailing (n-k) updated eigenvalues(those which were deflated) sorted into increasing order.

d

On exit, q contains the trailing (n-k) updated eigenvectors(those which were deflated) in its last (n-k) columns.

q

On exit, rho has been modified to the value required by?laed3.

rho

REAL for slaed8/claed8dlamda, wDOUBLE PRECISION for dlaed8/zlaed8.Arrays, dimension (n) each. The array dlamda(*) containsa copy of the first k eigenvalues which will be used by?laed3 to form the secular equation.The array w(*) will hold the first k values of the finaldeflation-altered z-vector and will be passed to ?laed3.

REAL for slaed8q2DOUBLE PRECISION for dlaed8COMPLEX for claed8COMPLEX*16 for zlaed8.Arrayq2(ldq2, *). The second dimension of q2 must be at leastmax(1, n).Contains a copy of the first k eigenvectors which will beused by slaed7/dlaed7 in a matrix multiply (sgemm/dgemm)to update the new eigenvectors. For real flavors, If icompq= 0, q2 is not referenced.

INTEGER. Workspace arrays, dimension (n) each.indxp, indxThe array indxp(*) will contain the permutation used toplace deflated values of d at the end of the array. On output,indxp(1:k) points to the nondeflated d-values andindxp(k+1:n) points to the deflated eigenvalues.The array indx(*) will contain the permutation used to sortthe contents of d into ascending order.

INTEGER. Array, dimension (n ).perm

1255


Contains the permutations (from deflation and sorting) tobe applied to each eigenblock.

INTEGER. Contains the number of Givens rotations whichtook place in this subproblem.

givptr

INTEGER. Array, dimension (2, n ).givcolEach pair of numbers indicates a pair of columns to takeplace in a Givens rotation.

REAL for slaed8/claed8givnumDOUBLE PRECISION for dlaed8/zlaed8.Array, dimension (2, n).Each number indicates the S value to be used in thecorresponding Givens rotation.


?laed9Used by sstedc/dstedc. Finds the roots of thesecular equation and updates the eigenvectors.Used when the original matrix is dense.

Syntax

call slaed9( k, kstart, kstop, n, d, q, ldq, rho, dlamda, w, s, lds, info )

call dlaed9( k, kstart, kstop, n, d, q, ldq, rho, dlamda, w, s, lds, info )

Description

This routine finds the roots of the secular equation, as defined by the values in d, z, and rho,between kstart and kstop. It makes the appropriate calls to slaed4/dlaed4 and then storesthe new matrix of eigenvectors for use in calculating the next level of z vectors.

Input Parameters


be solved by slaed4/dlaed4 (k ≥ 0).

k

INTEGER. The updated eigenvalues lambda(i),kstart, kstop

1256


kstart ≤ i ≤ kstop are to be computed.

1 ≤ kstart ≤ kstop ≤ k.

INTEGER. The number of rows and columns in the Q matrix.

n ≥ k (deflation may result in n > k).

n

REAL for slaed9qDOUBLE PRECISION for dlaed9.Workspace array, dimension (ldq, *).The second dimension of q must be at least max(1, n).


ldq

REAL for slaed9rhoDOUBLE PRECISION for dlaed9The value of the parameter in the rank one update equation.

rho ≥ 0 required.

REAL for slaed9dlamda, wDOUBLE PRECISION for dlaed9Arrays, dimension (k) each. The first k elements of the arraydlamda(*) contain the old roots of the deflated updatingproblem. These are the poles of the secular equation.The first k elements of the array w(*) contain thecomponents of the deflation-adjusted updating vector.

INTEGER. The first dimension of the output array s; lds ≥max(1, k).

lds

Output Parameters

REAL for slaed9dDOUBLE PRECISION for dlaed9Array, dimension (n). d (i ) contains the updated eigenvalues

for kstart ≤ i ≤ kstop.

REAL for slaed9sDOUBLE PRECISION for dlaed9.Array, dimension (lds, *) .

1257


The second dimension of s must be at least max(1, k).Will contain the eigenvectors of the repaired matrix whichwill be stored for subsequent z vector calculation andmultiplied by the previously accumulated eigenvectors toupdate the system.

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value. Ifinfo = 1, the eigenvalue did not converge.

?laedaUsed by ?stedc. Computes the Z vectordetermining the rank-one modification of thediagonal matrix. Used when the original matrix isdense.

Syntax

call slaeda( n, tlvls, curlvl, curpbm, prmptr, perm, givptr, givcol, givnum,q, qptr, z, ztemp, info )

call dlaeda( n, tlvls, curlvl, curpbm, prmptr, perm, givptr, givcol, givnum,q, qptr, z, ztemp, info )

Description

The routine ?laeda computes the Z vector corresponding to the merge step in the curlvl-thstep of the merge process with tlvls steps for the curpbm-th problem.

Input Parameters


(n ≥ 0).

n

INTEGER. The total number of merging levels in the overalldivide and conquer tree.

tlvls

INTEGER. The current level in the overall merge routine, 0

≤ curlvl ≤ tlvls .

curlvl

1258


INTEGER. The current problem in the current level in theoverall merge routine (counting from upper left to lowerright).

curpbm

INTEGER. Arrays, dimension (n lgn ) each.prmptr, perm, givptrThe array prmptr(*) contains a list of pointers which indicatewhere in perm a level's permutation is stored. prmptr(i+1)- prmptr(i) indicates the size of the permutation and alsothe size of the full, non-deflated problem.The array perm(*) contains the permutations (from deflationand sorting) to be applied to each eigenblock.The array givptr(*) contains a list of pointers which indicatewhere in givcol a level's Givens rotations are stored.givptr(i+1) - givptr(i) indicates the number of Givensrotations.

INTEGER. Array, dimension (2, n lgn ).givcolEach pair of numbers indicates a pair of columns to takeplace in a Givens rotation.

REAL for slaedagivnumDOUBLE PRECISION for dlaeda.Array, dimension (2, n lgn).Each number indicates the S value to be used in thecorresponding Givens rotation.

REAL for slaedaqDOUBLE PRECISION for dlaeda.Array, dimension ( n2).Contains the square eigenblocks from previous levels, thestarting positions for blocks are given by qptr.

INTEGER. Array, dimension (n+2). Contains a list of pointerswhich indicate where in q an eigenblock is stored. sqrt(qptr(i+1) - qptr(i)) indicates the size of the block.

qptr

REAL for slaedaztempDOUBLE PRECISION for dlaeda.Workspace array, dimension (n).

Output Parameters

REAL for slaedaz

1259


DOUBLE PRECISION for dlaeda.Array, dimension (n). Contains the updating vector (the lastrow of the first sub-eigenvector matrix and the first row ofthe second sub-eigenvector matrix).


?laeinComputes a specified right or left eigenvector ofan upper Hessenberg matrix by inverse iteration.

Syntax

call slaein( rightv, noinit, n, h, ldh, wr, wi, vr, vi, b, ldb, work, eps3,smlnum, bignum, info )

call dlaein( rightv, noinit, n, h, ldh, wr, wi, vr, vi, b, ldb, work, eps3,smlnum, bignum, info )

call claein( rightv, noinit, n, h, ldh, w, v, b, ldb, rwork, eps3, smlnum,info )

call zlaein( rightv, noinit, n, h, ldh, w, v, b, ldb, rwork, eps3, smlnum,info )

Description

The routine ?laein uses inverse iteration to find a right or left eigenvector corresponding tothe eigenvalue (wr,wi) of a real upper Hessenberg matrix H (for real flavors slaein/dlaein)or to the eigenvalue w of a complex upper Hessenberg matrix H (for complex flavorsclaein/zlaein).

Input Parameters

LOGICAL.rightvIf rightv = .TRUE., compute right eigenvector;if rightv = .FALSE., compute left eigenvector.

LOGICAL.noinit

1260


If noinit = .TRUE., no initial vector is supplied in (vr,vi)or in v (for complex flavors);if noinit = .FALSE., initial vector is supplied in (vr,vi)or in v (for complex flavors).


REAL for slaeinhDOUBLE PRECISION for dlaeinCOMPLEX for claeinCOMPLEX*16 for zlaein.Array h(ldh, *).The second dimension of h must be at least max(1, n).Contains the upper Hessenberg matrix H.

INTEGER. The first dimension of the array h; ldh ≥ max(1,n).

ldh

REAL for slaeinwr, wiDOUBLE PRECISION for dlaein.The real and imaginary parts of the eigenvalue of H whosecorresponding right or left eigenvector is to be computed(for real flavors of the routine).

COMPLEX for claeinwCOMPLEX*16 for zlaein.The eigenvalue of H whose corresponding right or lefteigenvector is to be computed (for complex flavors of theroutine).

REAL for slaeinvr, viDOUBLE PRECISION for dlaein.Arrays, dimension (n) each. Used for real flavors only. Onentry, if noinit = .FALSE. and wi = 0.0, vr must containa real starting vector for inverse iteration using the realeigenvalue wr;

if noinit = .FALSE. and wi ≠ 0.0, vr and vi mustcontain the real and imaginary parts of a complex startingvector for inverse iteration using the complex eigenvalue(wr,wi);otherwise vr and vi need not be set.

COMPLEX for claeinvCOMPLEX*16 for zlaein.

1261


Array, dimension (n). Used for complex flavors only. Onentry, if noinit = .FALSE., v must contain a startingvector for inverse iteration; otherwise v need not be set.

REAL for slaeinbDOUBLE PRECISION for dlaeinCOMPLEX for claeinCOMPLEX*16 for zlaein.Workspace array b(ldb, *). The second dimension of b mustbe at least max(1, n).

INTEGER. The first dimension of the array b;ldb

ldb ≥ n+1 for real flavors;

ldb ≥ max(1, n) for complex flavors.

REAL for slaeinworkDOUBLE PRECISION for dlaein.Workspace array, dimension (n).Used for real flavors only.

REAL for claeinrworkDOUBLE PRECISION for zlaein.Workspace array, dimension (n).Used for complex flavors only.

REAL for slaein/claeineps3, smlnumDOUBLE PRECISION for dlaein/zlaein.eps3 is a small machine-dependent value which is used toperturb close eigenvalues, and to replace zero pivots.smlnum is a machine-dependent value close to underflowthreshold.

REAL for slaeinbignumDOUBLE PRECISION for dlaein.bignum is a machine-dependent value close to overflowthreshold. Used for real flavors only.

Output Parameters

On exit, if wi = 0.0 (real eigenvalue), vr contains the

computed real eigenvector; if wi ≠ 0.0 (complexeigenvalue), vr and vi contain the real and imaginary parts

vr, vi

1262


of the computed complex eigenvector. The eigenvector isnormalized so that the component of largest magnitude hasmagnitude 1; here the magnitude of a complex number(x,y) is taken to be |x| + |y|.vi is not referenced if wi = 0.0.

On exit, v contains the computed eigenvector, normalizedso that the component of largest magnitude has magnitude1; here the magnitude of a complex number (x,y) is takento be |x| + |y|.

v

INTEGER.infoIf info = 0, the execution is successful.If info = 1, inverse iteration did not converge. For real

flavors, vr is set to the last iterate, and so is vi, if wi ≠0.0. For complex flavors, v is set to the last iterate.

?laev2Computes the eigenvalues and eigenvectors of a2-by-2 symmetric/Hermitian matrix.

Syntax

call slaev2( a, b, c, rt1, rt2, cs1, sn1 )

call dlaev2( a, b, c, rt1, rt2, cs1, sn1 )

call claev2( a, b, c, rt1, rt2, cs1, sn1 )

call zlaev2( a, b, c, rt1, rt2, cs1, sn1 )

Description

This routine performs the eigendecomposition of a 2-by-2 symmetric matrix

(for claev2/zlaev2).

1263


On return, rt1 is the eigenvalue of larger absolute value, rt2 of smaller absolute value, and(cs1, sn1) is the unit right eigenvector for rt1, giving the decomposition

(for slaev2/dlaev2),

or

(for claev2/zlaev2).

Input Parameters

REAL for slaev2a, b, cDOUBLE PRECISION for dlaev2COMPLEX for claev2COMPLEX*16 for zlaev2.Elements of the input matrix.

Output Parameters

REAL for slaev2/claev2rt1, rt2DOUBLE PRECISION for dlaev2/zlaev2.Eigenvalues of larger and smaller absolute value,respectively.

REAL for slaev2/claev2cs1DOUBLE PRECISION for dlaev2/zlaev2.

REAL for slaev2sn1DOUBLE PRECISION for dlaev2COMPLEX for claev2

1264


COMPLEX*16 for zlaev2.The vector (cs1, sn1) is the unit right eigenvector for rt1.

Application Notes

rt1 is accurate to a few ulps barring over/underflow. rt2 may be inaccurate if there is massivecancellation in the determinant a*c-b*b; higher precision or correctly rounded or correctlytruncated arithmetic would be needed to compute rt2 accurately in all cases. cs1 and sn1 areaccurate to a few ulps barring over/underflow. Overflow is possible only if rt1 is within a factorof 5 of overflow. Underflow is harmless if the input data is 0 or exceeds underflow_threshold/ macheps.

?laexcSwaps adjacent diagonal blocks of a real upperquasi-triangular matrix in Schur canonical form,by an orthogonal similarity transformation.

Syntax

call slaexc( wantq, n, t, ldt, q, ldq, j1, n1, n2, work, info )

call dlaexc( wantq, n, t, ldt, q, ldq, j1, n1, n2, work, info )

Description

This routine swaps adjacent diagonal blocks T11 and T22 of order 1 or 2 in an upperquasi-triangular matrix T by an orthogonal similarity transformation.

T must be in Schur canonical form, that is, block upper triangular with 1-by-1 and 2-by-2diagonal blocks; each 2-by-2 diagonal block has its diagonal elements equal and its off-diagonalelements of opposite sign.

Input Parameters

LOGICAL.wantqIf wantq = .TRUE., accumulate the transformation in thematrix Q;If wantq = .FALSE., do not accumulate the transformation.


REAL for slaexct, q

1265


DOUBLE PRECISION for dlaexcArrays:t(ldt,*) contains on entry the upper quasi-triangularmatrix T, in Schur canonical form.The second dimension of t must be at least max(1, n).q(ldq,*) contains on entry, if wantq = .TRUE., theorthogonal matrix Q. If wantq = .FALSE., q is notreferenced. The second dimension of q must be at leastmax(1, n).



If wantq = .FALSE., then ldq ≥ 1.

If wantq = .TRUE., then ldq ≥ max(1,n).

INTEGER. The index of the first row of the first block T11.j1

INTEGER. The order of the first block T11n1(n1 = 0, 1, or 2).

INTEGER. The order of the second block T22n2(n2 = 0, 1, or 2).

REAL for slaexc;workDOUBLE PRECISION for dlaexc.Workspace array, DIMENSION (n).

Output Parameters

On exit, the updated matrix T, again in Schur canonicalform.

t

On exit, if wantq = .TRUE., the updated matrix Q.q

INTEGER.infoIf info = 0, the execution is successful.If info = 1, the transformed matrix T would be too farfrom Schur form; the blocks are not swapped and T and Qare unchanged.

1266


?lag2Computes the eigenvalues of a 2-by-2 generalizedeigenvalue problem, with scaling as necessary toavoid over-/underflow.

Syntax

call slag2( a, lda, b, ldb, safmin, scale1, scale2, wr1, wr2, wi )

call dlag2( a, lda, b, ldb, safmin, scale1, scale2, wr1, wr2, wi )

Description

This routine computes the eigenvalues of a 2 x 2 generalized eigenvalue problem A - w *B, withscaling as necessary to avoid over-/underflow. The scaling factor, s, results in a modifiedeigenvalue equation

s*A - w*B ,

where s is a non-negative scaling factor chosen so that w, w*B, and s*A do not overflow and,if possible, do not underflow, either.

Input Parameters

REAL for slag2a, bDOUBLE PRECISION for dlag2Arrays:a(lda,2) contains, on entry, the 2 x 2 matrix A. It isassumed that its 1-norm is less than 1/safmin. Entries lessthan sqrt(safmin)*norm(A) are subject to being treatedas zero.b(ldb,2) contains, on entry, the 2 x 2 upper triangularmatrix B. It is assumed that the one-norm of B is less than1/safmin. The diagonals should be at least sqrt(safmin)times the largest element of B (in absolute value); if adiagonal is smaller than that, then +/- sqrt(safmin) willbe used instead of that diagonal.

INTEGER. The first dimension of a; lda ≥ 2.lda

INTEGER. The first dimension of b; ldb ≥ 2.ldb

REAL for slag2;safmin

1267


DOUBLE PRECISION for dlag2.The smallest positive number such that 1/safmin does notoverflow. (This should always be ?lamch('S') - it is anargument in order to avoid having to call ?lamchfrequently.)

Output Parameters

REAL for slag2;scale1DOUBLE PRECISION for dlag2.A scaling factor used to avoid over-/underflow in theeigenvalue equation which defines the first eigenvalue. Ifthe eigenvalues are complex, then the eigenvalues are (wr1+/- wii)/scale1 (which may lie outside the exponentrange of the machine), scale1=scale2, and scale1 willalways be positive.If the eigenvalues are real, then the first (real) eigenvalueis wr1/scale1, but this may overflow or underflow, and infact, scale1 may be zero or less than the underflowthreshhold if the exact eigenvalue is sufficiently large.

REAL for slag2;scale2DOUBLE PRECISION for dlag2.A scaling factor used to avoid over-/underflow in theeigenvalue equation which defines the second eigenvalue.If the eigenvalues are complex, then scale2=scale1. Ifthe eigenvalues are real, then the second (real) eigenvalueis wr2/scale2, but this may overflow or underflow, and infact, scale2 may be zero or less than the underflowthreshold if the exact eigenvalue is sufficiently large.

REAL for slag2;wr1DOUBLE PRECISION for dlag2.If the eigenvalue is real, then wr1 is scale1 times theeigenvalue closest to the (2,2) element of A*inv(B).If the eigenvalue is complex, then wr1=wr2 is scale1 timesthe real part of the eigenvalues.

REAL for slag2;wr2DOUBLE PRECISION for dlag2.

1268


If the eigenvalue is real, then wr2 is scale2 times the othereigenvalue. If the eigenvalue is complex, then wr1=wr2 isscale1 times the real part of the eigenvalues.

REAL for slag2;wiDOUBLE PRECISION for dlag2.If the eigenvalue is real, then wi is zero. If the eigenvalueis complex, then wi is scale1 times the imaginary part ofthe eigenvalues. wi will always be non-negative.

?lags2Computes 2-by-2 orthogonal matrices U, V, andQ, and applies them to matrices A and B such thatthe rows of the transformed A and B are parallel.

Syntax

call slags2( upper, a1, a2, a3, b1, b2, b3, csu, snu, csv, snv, csq, snq )

call dlags2( upper, a1, a2, a3, b1, b2, b3, csu, snu, csv, snv, csq, snq )

Description

This routine computes 2-by-2 orthogonal matrices U, V and Q, such that if upper = .TRUE.,then

and

or if upper = .FALSE., then

1269


and

The rows of the transformed A and B are parallel, where

Here Z' denotes the transpose of Z.

Input Parameters

LOGICAL.upperIf upper = .TRUE., the input matrices A and B are uppertriangular; If upper = .FALSE., the input matrices A andB are lower triangular.

REAL for slags2a1, a2, a3DOUBLE PRECISION for dlags2On entry, a1, a2 and a3 are elements of the input 2-by-2upper (lower) triangular matrix A.

REAL for slags2b1, b2, b3DOUBLE PRECISION for dlags2On entry, b1, b2 and b3 are elements of the input 2-by-2upper (lower) triangular matrix B.

1270


Output Parameters

REAL for slags2csu, snuDOUBLE PRECISION for dlags2The desired orthogonal matrix U.

REAL for slags2csv, snvDOUBLE PRECISION for dlags2The desired orthogonal matrix V.

REAL for slags2csq, snqDOUBLE PRECISION for dlags2The desired orthogonal matrix Q.

?lagtf

Computes an LU factorization of a matrix T-λI,where T is a general tridiagonal matrix, and λ ascalar, using partial pivoting with row interchanges.

Syntax

call slagtf( n, a, lambda, b, c, tol, d, in, info )

call dlagtf( n, a, lambda, b, c, tol, d, in, info )

Description

This routine factorizes the matrix (T - lambda*I), where T is an n-by-n tridiagonal matrixand lambda is a scalar, as

T - lambda*I = P*L*U,

where p is a permutation matrix, L is a unit lower tridiagonal matrix with at most one non-zerosub-diagonal elements per column and U is an upper triangular matrix with at most two non-zerosuper-diagonal elements per column. The factorization is obtained by Gaussian elimination withpartial pivoting and implicit row scaling. The parameter lambda is included in the routine sothat ?lagtf may be used, in conjunction with ?lagts, to obtain eigenvectors of T by inverseiteration.

Input Parameters


1271


REAL for slagtfa, b, cDOUBLE PRECISION for dlagtfArrays, dimension a(n), b(n-1), c(n-1):On entry, a(*) must contain the diagonal elements of thematrix T.On entry, b(*) must contain the (n-1) super-diagonalelements of T.On entry, c(*) must contain the (n-1) sub-diagonalelements of T.

REAL for slagtftolDOUBLE PRECISION for dlagtfOn entry, a relative tolerance used to indicate whether ornot the matrix (T - lambda*I) is nearly singular. tol shouldnormally be chose as approximately the largest relativeerror in the elements of T. For example, if the elements ofT are correct to about 4 significant figures, then tol shouldbe set to about 5*10-4. If tol is supplied as less than eps,where eps is the relative machine precision, then the valueeps is used in place of tol.

Output Parameters

On exit, a is overwritten by the n diagonal elements of theupper triangular matrix U of the factorization of T.

a

On exit, b is overwritten by the n-1 super-diagonal elementsof the matrix U of the factorization of T.

b

On exit, c is overwritten by the n-1 sub-diagonal elementsof the matrix L of the factorization of T.

c

REAL for slagtfdDOUBLE PRECISION for dlagtfArray, dimension (n-2).On exit, d is overwritten by the n-2 second super-diagonalelements of the matrix U of the factorization of T.

INTEGER.inArray, dimension (n).

1272


On exit, in contains details of the permutation matrix p. Ifan interchange occurred at the k-th step of the elimination,then in(k) = 1, otherwise in(k) = 0. The element in(n)returns the smallest positive integer j such that

abs(u(j,j)) ≤ norm((T - lambda*I)(j))*tol,where norm( A(j)) denotes the sum of the absolute valuesof the j-th row of the matrix A.If no such j exists then in(n) is returned as zero. If in(n)is returned as positive, then a diagonal element of U is small,indicating that (T - lambda*I) is singular or nearlysingular.

INTEGER.infoIf info = 0, the execution is successful.If info = -k, the k-th parameter had an illegal value.

?lagtmPerforms a matrix-matrix product of the form C =alpha*A*B+beta*C, where A is a tridiagonalmatrix, B and C are rectangular matrices, andalpha and beta are scalars, which may be 0, 1,or -1.

Syntax

call slagtm( trans, n, nrhs, alpha, dl, d, du, x, ldx, beta, b, ldb )

call dlagtm( trans, n, nrhs, alpha, dl, d, du, x, ldx, beta, b, ldb )

call clagtm( trans, n, nrhs, alpha, dl, d, du, x, ldx, beta, b, ldb )

call zlagtm( trans, n, nrhs, alpha, dl, d, du, x, ldx, beta, b, ldb )

Description

This routine performs a matrix-vector product of the form:

B := alpha*A*X + beta*B

where A is a tridiagonal matrix of order n, B and X are n-by-nrhs matrices, and alpha and betaare real scalars, each of which may be 0., 1., or -1.

1273


Input Parameters

CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then B := alpha*A*X + beta*B (notranspose);If trans = 'T', then B := alpha*AT*X + beta*B(transpose);If trans = 'C', then B := alpha*AH*X + beta*B(conjugate transpose)


INTEGER. The number of right-hand sides, i.e., the number

of columns in X and B (nrhs ≥ 0).

nrhs

REAL for slagtm/clagtmalpha, betaDOUBLE PRECISION for dlagtm/zlagtm

The scalars α and β. alpha must be 0., 1., or -1.; otherwise,it is assumed to be 0. beta must be 0., 1., or -1.; otherwise,it is assumed to be 1.

REAL for slagtmdl, d, duDOUBLE PRECISION for dlagtmCOMPLEX for clagtmCOMPLEX*16 for zlagtm.Arrays: dl(n - 1), d(n), du(n - 1).The array dl contains the (n - 1) sub-diagonal elements ofT.The array d contains the n diagonal elements of T.The array du contains the (n - 1) super-diagonal elementsof T.

REAL for slagtmx, bDOUBLE PRECISION for dlagtmCOMPLEX for clagtmCOMPLEX*16 for zlagtm.Arrays:x(ldx,*) contains the n-by-nrhs matrix X. The seconddimension of x must be at least max(1, nrhs).

1274


b(ldb,*) contains the n-by-nrhs matrix B. The seconddimension of b must be at least max(1, nrhs).

INTEGER. The leading dimension of the array x; ldx ≥max(1, n).

ldx

INTEGER. The leading dimension of the array b; ldb ≥max(1, n).

ldb

Output Parameters

Overwritten by the matrix expression B := alpha*A*X +beta*B

b

?lagtsSolves the system of equations (T - lambda*I)*x= y or (T - lambda*I)T*x = y ,where T is a generaltridiagonal matrix and lambda is a scalar, usingthe LU factorization computed by ?lagtf.

Syntax

call slagts( job, n, a, b, c, d, in, y, tol, info )

call dlagts( job, n, a, b, c, d, in, y, tol, info )

Description

This routine may be used to solve for x one of the systems of equations:

(T - lambda*I)*x = y or (T - lambda*I)'*x = y,

where T is an n-by-n tridiagonal matrix, following the factorization of (T - lambda*I) as

T - lambda*I = P*L*U,

computed by the routine ?lagtf.

The choice of equation to be solved is controlled by the argument job, and in each case thereis an option to perturb zero or very small diagonal elements of U, this option being intendedfor use in applications such as inverse iteration.

1275


Input Parameters

INTEGER. Specifies the job to be performed by ?lagts asfollows:

job

= 1: The equations (T - lambda*I)x = y are to be solved,but diagonal elements of U are not to be perturbed.= -1: The equations (T - lambda*I)x = y are to be solvedand, if overflow would otherwise occur, the diagonalelements of U are to be perturbed. See argument tol below.= 2: The equations (T - lambda*I)'x = y are to besolved, but diagonal elements of U are not to be perturbed.= -2: The equations (T - lambda*I)'x = y are to besolved and, if overflow would otherwise occur, the diagonalelements of U are to be perturbed. See argument tol below.


REAL for slagtsa, b, c, dDOUBLE PRECISION for dlagtsArrays, dimension a(n), b(n-1), c(n-1), d(n-2):On entry, a(*) must contain the diagonal elements of U asreturned from ?lagtf.On entry, b(*) must contain the first super-diagonalelements of U as returned from ?lagtf.On entry, c(*) must contain the sub-diagonal elements ofL as returned from ?lagtf.On entry, d(*) must contain the second super-diagonalelements of U as returned from ?lagtf.

INTEGER.inArray, dimension (n).On entry, in(*) must contain details of the matrix p asreturned from ?lagtf.

REAL for slagtsyDOUBLE PRECISION for dlagtsArray, dimension (n). On entry, the right hand side vectory.

REAL for slagtftolDOUBLE PRECISION for dlagtf.

1276


On entry, with job < 0, tol should be the minimumperturbation to be made to very small diagonal elements ofU. tol should normally be chosen as about eps*norm(U),where eps is the relative machine precision, but if tol issupplied as non-positive, then it is reset to eps*max( abs(u(i,j)) ). If job > 0 then tol is not referenced.

Output Parameters

On exit, y is overwritten by the solution vector x.y

On exit, tol is changed as described in Input Parameterssection above, only if tol is non-positive on entry. Otherwisetol is unchanged.

tol

INTEGER.infoIf info = 0, the execution is successful.If info = -i, the i-th parameter had an illegal value. Ifinfo = i >0, overflow would occur when computing theith element of the solution vector x. This can only occurwhen job is supplied as positive and either means that adiagonal element of U is very small, or that the elements ofthe right-hand side vector y are very large.

?lagv2Computes the Generalized Schur factorization ofa real 2-by-2 matrix pencil (A,B) where B is uppertriangular.

Syntax

call slagv2( a, lda, b, ldb, alphar, alphai, beta, csl, snl, csr, snr )

call dlagv2( a, lda, b, ldb, alphar, alphai, beta, csl, snl, csr, snr )

Description

This routine computes the Generalized Schur factorization of a real 2-by-2 matrix pencil (A,B)where B is upper triangular. The routine computes orthogonal (rotation) matrices given by csl,snl and csr, snr such that:

1) if the pencil (A,B) has two real eigenvalues (include 0/0 or 1/0 types), then

1277

2) if the pencil (A,B) has a pair of complex conjugate eigenvalues, then

where b11 ≥ b22>0.

Input Parameters

REAL for slagv2a, bDOUBLE PRECISION for dlagv2Arrays:a(lda,2) contains the 2-by-2 matrix A;b(ldb,2) contains the upper triangular 2-by-2 matrix B.

INTEGER. The leading dimension of the array a;lda

lda ≥ 2.

1278


INTEGER. The leading dimension of the array b;ldb

ldb ≥ 2.

Output Parameters

On exit, a is overwritten by the “A-part” of the generalizedSchur form.

a

On exit, b is overwritten by the “B-part” of the generalizedSchur form.

b

REAL for slagv2alphar, alphai, betaDOUBLE PRECISION for dlagv2.Arrays, dimension (2) each.(alphar(k) + i*alphai(k))/beta(k) are theeigenvalues of the pencil (A,B), k=1,2 and i = sqrt(-1).Note that beta(k) may be zero.

REAL for slagv2csl, snlDOUBLE PRECISION for dlagv2The cosine and sine of the left rotation matrix, respectively.

REAL for slagv2csr, snrDOUBLE PRECISION for dlagv2The cosine and sine of the right rotation matrix, respectively.

1279


?lahqrComputes the eigenvalues and Schur factorizationof an upper Hessenberg matrix, using thedouble-shift/single-shift QR algorithm.

Syntax

call slahqr( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,info )

call dlahqr( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,info )

call clahqr( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, info)

call zlahqr( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, info)

Description

This routine is an auxiliary routine called by ?hseqr to update the eigenvalues and Schurdecomposition already computed by ?hseqr, by dealing with the Hessenberg submatrix in rowsand columns ilo to ihi.

Input Parameters

LOGICAL.wanttIf wantt = .TRUE., the full Schur form T is required;If wantt = .FALSE., eigenvalues only are required.

LOGICAL.wantzIf wantz = .TRUE., the matrix of Schur vectors Z isrequired;If wantz = .FALSE., Schur vectors are not required.


INTEGER.ilo, ihi

1280


It is assumed that h is already upper quasi-triangular inrows and columns ihi+1:n, and that h(ilo,ilo-1) = 0(unless ilo = 1). The routine ?lahqr works primarily withthe Hessenberg submatrix in rows and columns ilo to ihi,but applies transformations to all of h if wantt = .TRUE..Constraints:

1 ≤ ilo ≤ max(1,ihi); ihi ≤ n.

REAL for slahqrh, zDOUBLE PRECISION for dlahqrCOMPLEX for clahqrCOMPLEX*16 for zlahqr.Arrays:h(ldh,*) contains the upper Hessenberg matrix h.The second dimension of h must be at least max(1, n).z(ldz,*)If wantz = .TRUE., then, on entry, z must contain thecurrent matrix z of transformations accumulated by ?hseqr.If wantz = .FALSE., then z is not referenced. The seconddimension of z must be at least max(1, n).


INTEGER. The first dimension of z; at least max(1, n).ldz

INTEGER. Specify the rows of z to which transformationsmust be applied if wantz = .TRUE..

iloz, ihiz

1 ≤ iloz ≤ ilo; ihi ≤ ihiz ≤ n.

Output Parameters

On exit, if info= 0 and wantt = .TRUE., then,h

• for slahqr/dlahqr, h is upper quasi-triangular in rowsand columns ilo:ihi with any 2-by-2 diagonal blocksin standard form.

• for clahqr/zlahqr, h is upper triangular in rows andcolumns ilo:ihi.

1281


If info= 0 and wantt = .FALSE., the contents of h areunspecified on exit. If info is positive, see description ofinfo for the output state of h.

REAL for slahqrwr, wiDOUBLE PRECISION for dlahqrArrays, DIMENSION at least max(1, n) each. Used with realflavors only. The real and imaginary parts, respectively, ofthe computed eigenvalues ilo to ihi are stored in thecorresponding elements of wr and wi. If two eigenvaluesare computed as a complex conjugate pair, they are storedin consecutive elements of wr and wi, say the i-th and(i+1)-th, with wi(i)> 0 and wi(i+1) < 0.If wantt = .TRUE., the eigenvalues are stored in the sameorder as on the diagonal of the Schur form returned in h,with wr(i) = h(i,i), and, if h(i:i+1, i:i+1) is a 2-by-2diagonal block, wi(i) = sqrt(h(i+1,i)*h(i,i+1)) andwi(i+1) = -wi(i).

COMPLEX for clahqrwCOMPLEX*16 for zlahqr.Array, DIMENSION at least max(1, n). Used with complexflavors only. The computed eigenvalues ilo to ihi arestored in the corresponding elements of w.If wantt = .TRUE., the eigenvalues are stored in the sameorder as on the diagonal of the Schur form returned in h,with w(i) = h(i,i).

If wantz = .TRUE., then, on exit z has been updated;transformations are applied only to the submatrixz(iloz:ihiz, ilo:ihi).

z

INTEGER.infoIf info = 0, the execution is successful.With info > 0,

• if info = i, ?lahqr failed to compute all the eigenvaluesilo to ihi in a total of 30 iterations per eigenvalue;elements i+1:ihi of wr and wi (for slahqr/dlahqr) orw (for clahqr/zlahqr) contain those eigenvalues whichhave been successfully computed.

1282

• if wantt is .FALSE., then on exit the remainingunconverged eigenvalues are the eigenvalues of theupper Hessenberg matrix rows and columns ilo thorughinfo of the final output value of h.

• if wantt is .TRUE., then on exit

(initial value of h)*u = u*(final value of h),(*)

where u is an orthognal matrix. The final value of h isupper Hessenberg and triangular in rows and columnsinfo+1 through ihi.

• if wantz is .TRUE., then on exit

(final value of z) = (initial value of z)* u,

where u is an orthognal matrix in (*) regardless of thevalue of wantt.

?lahrdReduces the first nb columns of a generalrectangular matrix A so that elements below thek-th subdiagonal are zero, and returns auxiliarymatrices which are needed to apply thetransformation to the unreduced part of A.

Syntax

call slahrd( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call dlahrd( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call clahrd( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call zlahrd( n, k, nb, a, lda, tau, t, ldt, y, ldy )

1283


Description

The routine reduces the first nb columns of a real/complex general n-by-(n-k+1) matrix A sothat elements below the k-th subdiagonal are zero. The reduction is performed by anorthogonal/unitary similarity transformation Q'*A*Q. The routine returns the matrices V and Twhich determine Q as a block reflector I - V*T*V', and also the matrix Y = A*V*T.

The matrix Q is represented as products of nb elementary reflectors:

Q = H(1) H(2)... H(nb)


H(i) = I - tau*v*v'

where tau is a real/complex scalar, and v is a real/complex vector.

This is an obsolete auxiliary routine. Please use the new routine ?lahr2 instead.

Input Parameters


INTEGER. The offset for the reduction. Elements below thek-th subdiagonal in the first nb columns are reduced to zero.

k

INTEGER. The number of columns to be reduced.nb

REAL for slahrdaDOUBLE PRECISION for dlahrdCOMPLEX for clahrdCOMPLEX*16 for zlahrd.Array a(lda, n-k+1) contains the n-by-(n-k+1) generalmatrix A to be reduced.


INTEGER. The first dimension of the output array t; mustbe at least max(1, nb).

ldt

INTEGER. The first dimension of the output array y; mustbe at least max(1, n).

ldy

1284


Output Parameters

On exit, the elements on and above the k-th subdiagonalin the first nb columns are overwritten with thecorresponding elements of the reduced matrix; the elements

a

below the k-th subdiagonal, with the array tau, representthe matrix Q as a product of elementary reflectors. The othercolumns of a are unchanged. See Application Notes below.

REAL for slahrdtauDOUBLE PRECISION for dlahrdCOMPLEX for clahrdCOMPLEX*16 for zlahrd.Array, DIMENSION (nb).Contains scalar factors of the elementary reflectors.

REAL for slahrdt, yDOUBLE PRECISION for dlahrdCOMPLEX for clahrdCOMPLEX*16 for zlahrd.Arrays, dimension t(ldt, nb), y(ldy, nb).The array t contains upper triangular matrix T.The array y contains the n-by-nb matrix Y .

Application Notes

For the elementary reflector H(i),

v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in a(i+k+1:n, i) and tau isstored in tau(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, withT and Y, to apply the transformation to the unreduced part of the matrix, using an update ofthe form:

A := (I - V T V') * (A - Y V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb= 2:

1285



?lahr2Reduces the specified number of first columns ofa general rectangular matrix A so that elementsbelow the specified subdiagonal are zero, andreturns auxiliary matrices which are needed toapply the transformation to the unreduced part ofA.

Syntax

call slahr2( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call dlahr2( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call clahr2( n, k, nb, a, lda, tau, t, ldt, y, ldy )

call zlahr2( n, k, nb, a, lda, tau, t, ldt, y, ldy )

Description

The routine reduces the first nb columns of a real/complex general n-by-(n-k+1) matrix A sothat elements below the k-th subdiagonal are zero. The reduction is performed by anorthogonal/unitary similarity transformation Q'*A*Q. The routine returns the matrices V and Twhich determine Q as a block reflector I - V*T*V', and also the matrix Y = A*V*T.

1286


The matrix Q is represented as products of nb elementary reflectors:

Q = H(1) H(2)... H(nb)


H(i) = I - tau*v*v'

where tau is a real/complex scalar, and v is a real/complex vector.

This is an auxiliary routine called by ?gehrd.

Input Parameters


INTEGER. The offset for the reduction. Elements below thek-th subdiagonal in the first nb columns are reduced to zero(k < n).

k

INTEGER. The number of columns to be reduced.nb

REAL for slahr2aDOUBLE PRECISION for dlahr2COMPLEX for clahr2COMPLEX*16 for zlahr2.Array, DIMENSION (lda, n-k+1) contains the n-by-(n-k+1)general matrix A to be reduced.

INTEGER. The first dimension of the array a; lda ≥ max(1,n).

lda

INTEGER. The first dimension of the output array t; ldt ≥nb.

ldt

INTEGER. The first dimension of the output array y; ldy ≥n.

ldy

Output Parameters

On exit, the elements on and above the k-th subdiagonalin the first nb columns are overwritten with thecorresponding elements of the reduced matrix; the elements

a

below the k-th subdiagonal, with the array tau, representthe matrix Q as a product of elementary reflectors. The othercolumns of a are unchanged. See Application Notes below.

1287

REAL for slahr2tauDOUBLE PRECISION for dlahr2COMPLEX for clahr2COMPLEX*16 for zlahr2.Array, DIMENSION (nb).Contains scalar factors of the elementary reflectors.

REAL for slahr2t, yDOUBLE PRECISION for dlahr2COMPLEX for clahr2COMPLEX*16 for zlahr2.Arrays, dimension t(ldt, nb), y(ldy, nb).The array t contains upper triangular matrix T.The array y contains the n-by-nb matrix Y .

Application Notes

For the elementary reflector H(i),

v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in a(i+k+1:n, i) and tau isstored in tau(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, withT and Y, to apply the transformation to the unreduced part of the matrix, using an update ofthe form:

A := (I - V*T*V') * (A - Y*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb= 2:

1288



?laic1Applies one step of incremental conditionestimation.

Syntax

call slaic1( job, j, x, sest, w, gamma, sestpr, s, c )

call dlaic1( job, j, x, sest, w, gamma, sestpr, s, c )

call claic1( job, j, x, sest, w, gamma, sestpr, s, c )

call zlaic1( job, j, x, sest, w, gamma, sestpr, s, c )

Description

The routine ?laic1 applies one step of incremental condition estimation in its simplest version.

Let x, ||x||2 = 1 (where ||a||2 denotes the 2-norm of a), be an approximate singular vectorof an j-by-j lower triangular matrix L, such that

||L*x||2 = sest

Then ?laic1 computes sestpr, s, c such that the vector

is an approximate singular vector of

1289


in the sense that

||Lhat*xhat||2 = sestpr.

Depending on job, an estimate for the largest or smallest singular value is computed.

Note that [s c]' and sestpr2 is an eigenpair of the system (for slaic1/claic)

where alpha = x'*w ;

or of the system (for claic1/zlaic)

where alpha = conjg(x)'*w.

Input Parameters

INTEGER.jobIf job =1, an estimate for the largest singular value iscomputed;If job =2, an estimate for the smallest singular value iscomputed;

INTEGER. Length of x and w.j

REAL for slaic1x, wDOUBLE PRECISION for dlaic1COMPLEX for claic1COMPLEX*16 for zlaic1.Arrays, dimension (j) each. Contain vectors x and w,respectively.

REAL for slaic1/claic1;sest

1290


DOUBLE PRECISION for dlaic1/zlaic1.Estimated singular value of j-by-j matrix L.

REAL for slaic1gammaDOUBLE PRECISION for dlaic1COMPLEX for claic1COMPLEX*16 for zlaic1.The diagonal element gamma.

Output Parameters

REAL for slaic1/claic1;sestprDOUBLE PRECISION for dlaic1/zlaic1.Estimated singular value of (j+1)-by-(j+1) matrix Lhat.

REAL for slaic1s, cDOUBLE PRECISION for dlaic1COMPLEX for claic1COMPLEX*16 for zlaic1.Sine and cosine needed in forming xhat.

?laln2Solves a 1-by-1 or 2-by-2 linear system ofequations of the specified form.

Syntax

call slaln2( ltrans, na, nw, smin, ca, a, lda, d1, d2, b, ldb, wr, wi, x, ldx,scale, xnorm, info )

call dlaln2( ltrans, na, nw, smin, ca, a, lda, d1, d2, b, ldb, wr, wi, x, ldx,scale, xnorm, info )

Description

The routine solves a system of the form

(ca*A - w*D)*X = s*B, or (ca*A' - w*D)*X = s*B

with possible scaling (s) and perturbation of A (A' means A-transpose.)

1291


A is an na-by-na real matrix, ca is a real scalar, D is an na-by-na real diagonal matrix, w is areal or complex value, and X and B are na-by-1 matrices: real if w is real, complex if w is complex.The parameter na may be 1 or 2.

If w is complex, X and B are represented as na-by-2 matrices, the first column of each beingthe real part and the second being the imaginary part.

The routine computes the scaling factor s ( ≤ 1 ) so chosen that X can be computed withoutoverflow. X is further scaled if necessary to assure that norm(ca*A - w*D)*norm(X) is lessthan overflow.

If both singular values of (ca&*A - w*D) are less than smin, smin*I (where I stands foridentity) will be used instead of (ca*A - w*D). If only one singular value is less than smin, oneelement of (ca*A - w*D) will be perturbed enough to make the smallest singular value roughlysmin.

If both singular values are at least smin, (ca*A - w*D) will not be perturbed. In any case,the perturbation will be at most some small multiple of max(smin, ulp*norm(ca*A - w*D)).

The singular values are computed by infinity-norm approximations, and thus will only be correctto a factor of 2 or so.

NOTE. All input quantities are assumed to be smaller than overflow by a reasonablefactor (see bignum).

Input Parameters

LOGICAL.transIf trans = .TRUE., A- transpose will be used.If trans = .FALSE., A will be used (not transposed.)

INTEGER. The size of the matrix A, possible values 1 or 2.na

INTEGER. This parameter must be 1 if w is real, and 2 if wis complex. Possible values 1 or 2.

nw

REAL for slaln2sminDOUBLE PRECISION for dlaln2.The desired lower bound on the singular values of A.

1292


This should be a safe distance away from underflow oroverflow, for example, between(underflow/machine_precision) and(machine_precision * overflow). (See bignum and ulp).

REAL for slaln2caDOUBLE PRECISION for dlaln2.The coefficient by which A is multiplied.

REAL for slaln2aDOUBLE PRECISION for dlaln2.Array, DIMENSION (lda,na).The na-by-na matrix A.

INTEGER. The leading dimension of a. Must be at least na.lda

REAL for slaln2d1, d2DOUBLE PRECISION for dlaln2.The (1,1) and (2,2) elements in the diagonal matrix D,respectively. d2 is not used if nw = 1.

REAL for slaln2bDOUBLE PRECISION for dlaln2.Array, DIMENSION (ldb,nw). The na-by-nw matrix B(right-hand side). If nw =2 (w is complex), column 1 containsthe real part of B and column 2 contains the imaginary part.

INTEGER. The leading dimension of b. Must be at least na.ldb

REAL for slaln2wr, wiDOUBLE PRECISION for dlaln2.The real and imaginary part of the scalar w, respectively.wi is not used if nw = 1.

INTEGER. The leading dimension of the output array x. Mustbe at least na.

ldx

Output Parameters

REAL for slaln2xDOUBLE PRECISION for dlaln2.Array, DIMENSION (ldx,nw). The na-by-nw matrix X(unknowns), as computed by the routine. If nw = 2 (w iscomplex), on exit, column 1 will contain the real part of Xand column 2 will contain the imaginary part.

1293


REAL for slaln2scaleDOUBLE PRECISION for dlaln2.The scale factor that B must be multiplied by to insure thatoverflow does not occur when computing X. Thus (ca*A -w*D) X will be scale*B, not B (ignoring perturbations ofA.) It will be at most 1.

REAL for slaln2xnormDOUBLE PRECISION for dlaln2.The infinity-norm of X, when X is regarded as an na-by-nwreal matrix.

INTEGER.infoAn error flag. It will be zero if no error occurs, a negativenumber if an argument is in error, or a positive number if(ca*A - w*D) had to be perturbed. The possible valuesare:If info = 0: no error occurred, and (ca*A - w*D) did nothave to be perturbed.If info = 1: (ca*A - w*D) had to be perturbed to makeits smallest (or only) singular value greater than smin.

NOTE. For higher speed, this routine does not check the inputs for errors.

1294


?lals0Applies back multiplying factors in solving the leastsquares problem using divide and conquer SVDapproach. Used by ?gelsd.

Syntax

call slals0 ( icompq, nl, nr, sqre, nrhs, b, ldb, bx, ldbx, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, work, info )

call dlals0 ( icompq, nl, nr, sqre, nrhs, b, ldb, bx, ldbx, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, work, info )

call clals0( icompq, nl, nr, sqre, nrhs, b, ldb, bx, ldbx, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, rwork, info )

call zlals0( icompq, nl, nr, sqre, nrhs, b, ldb, bx, ldbx, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, rwork, info )

Description

The routine applies back the multiplying factors of either the left or right singular vector matrixof a diagonal matrix appended by a row to the right hand side matrix B in solving the leastsquares problem using the divide-and-conquer SVD approach.

For the left singular vector matrix, three types of orthogonal matrices are involved:

(1L) Givens rotations: the number of such rotations is givptr;the pairs of columns/rows theywere applied to are stored in givcol;and the c- and s-values of these rotations are stored ingivnum.

(2L) Permutation. The (nl+1)-st row of B is to be moved to the first row, and for j=2:n,perm(j)-th row of B is to be moved to the j-th row.

(3L) The left singular vector matrix of the remaining matrix.

For the right singular vector matrix, four types of orthogonal matrices are involved:

(1R) The right singular vector matrix of the remaining matrix.

(2R) If sqre = 1, one extra Givens rotation to generate the right null space.

(3R) The inverse transformation of (2L).

(4R) The inverse transformation of (1L).

1295


Input Parameters

INTEGER. Specifies whether singular vectors are to becomputed in factored form:

icompq

If icompq = 0: Left singular vector matrix.If icompq = 1: Right singular vector matrix.

INTEGER. The row dimension of the upper block.nl

nl ≥ 1.

INTEGER. The row dimension of the lower block.nr

nr ≥ 1.

INTEGER.sqreIf sqre = 0: the lower block is an nr-by-nr square matrix.If sqre = 1: the lower block is an nr-by-(nr+1) rectangularmatrix. The bidiagonal matrix has row dimension n = nl+ nr + 1, and column dimension m = n + sqre.

INTEGER. The number of columns of B and bx.nrhsMust be at least 1.

REAL for slals0bDOUBLE PRECISION for dlals0COMPLEX for clals0COMPLEX*16 for zlals0.Array, DIMENSION ( ldb, nrhs ).Contains the right hand sides of the least squares problemin rows 1 through m.

INTEGER. The leading dimension of b.ldbMust be at least max(1,max( m, n )).

REAL for slals0bxDOUBLE PRECISION for dlals0COMPLEX for clals0COMPLEX*16 for zlals0.Workspace array, DIMENSION ( ldbx, nrhs ).

INTEGER. The leading dimension of bx.ldbx

INTEGER. Array, DIMENSION (n).permThe permutations (from deflation and sorting) applied tothe two blocks.

1296


INTEGER. The number of Givens rotations which took placein this subproblem.

givptr

INTEGER. Array, DIMENSION ( ldgcol, 2 ). Each pair ofnumbers indicates a pair of rows/columns involved in aGivens rotation.

givcol

INTEGER. The leading dimension of givcol, must be at leastn.

ldgcol

REAL for slals0/clals0givnumDOUBLE PRECISION for dlals0/zlals0Array, DIMENSION ( ldgnum, 2 ). Each number indicates thec or s value used in the corresponding Givens rotation.

INTEGER. The leading dimension of arrays difr, poles andgivnum, must be at least k.

ldgnum

REAL for slals0/clals0polesDOUBLE PRECISION for dlals0/zlals0Array, DIMENSION ( ldgnum, 2 ). On entry, poles(1:k, 1)contains the new singular values obtained from solving thesecular equation, and poles(1:k, 2) is an array containingthe poles in the secular equation.

REAL for slals0/clals0diflDOUBLE PRECISION for dlals0/zlals0Array, DIMENSION ( k ). On entry, difl(i) is the distancebetween i-th updated (undeflated) singular value and thei-th (undeflated) old singular value.

REAL for slals0/clals0difrDOUBLE PRECISION for dlals0/zlals0Array, DIMENSION ( ldgnum, 2 ). On entry, difr(i, 1)contains the distances between i-th updated (undeflated)singular value and the i+1-th (undeflated) old singularvalue. And difr(i, 2) is the normalizing factor for the i-thright singular vector.

REAL for slals0/clals0zDOUBLE PRECISION for dlals0/zlals0Array, DIMENSION ( k ). Contains the components of thedeflation-adjusted updating row vector.

1297


INTEGER. Contains the dimension of the non-deflated matrix.

This is the order of the related secular equation. 1 ≤ k ≤n.

K

REAL for slals0/clals0cDOUBLE PRECISION for dlals0/zlals0Contains garbage if sqre =0 and the c value of a Givensrotation related to the right null space if sqre = 1.

REAL for slals0/clals0sDOUBLE PRECISION for dlals0/zlals0Contains garbage if sqre =0 and the s value of a Givensrotation related to the right null space if sqre = 1.

REAL for slals0workDOUBLE PRECISION for dlals0Workspace array, DIMENSION ( k ). Used with real flavorsonly.

REAL for clals0rworkDOUBLE PRECISION for zlals0Workspace array, DIMENSION (k*(1+nrhs) + 2*nrhs). Usedwith complex flavors only.

Output Parameters

On exit, contains the solution X in rows 1 through n.b

INTEGER.infoIf info = 0: successful exit.If info = -i < 0, the i-th argument had an illegal value.

1298

?lalsaComputes the SVD of the coefficient matrix incompact form. Used by ?gelsd.

Syntax

call slalsa( icompq, smlsiz, n, nrhs, b, ldb, bx, ldbx, u, ldu, vt, k, difl,difr, z, poles, givptr, givcol, ldgcol, perm, givnum, c, s, work, iwork, info)

call dlalsa( icompq, smlsiz, n, nrhs, b, ldb, bx, ldbx, u, ldu, vt, k, difl,difr, z, poles, givptr, givcol, ldgcol, perm, givnum, c, s, work, iwork, info)

call clalsa( icompq, smlsiz, n, nrhs, b, ldb, bx, ldbx, u, ldu, vt, k, difl,difr, z, poles, givptr, givcol, ldgcol, perm, givnum, c, s, rwork, iwork,info )

call zlalsa( icompq, smlsiz, n, nrhs, b, ldb, bx, ldbx, u, ldu, vt, k, difl,difr, z, poles, givptr, givcol, ldgcol, perm, givnum, c, s, rwork, iwork,info )

Description

The routine is an itermediate step in solving the least squares problem by computing the SVDof the coefficient matrix in compact form. The singular vectors are computed as products ofsimple orthorgonal matrices.

If icompq = 0, ?lalsa applies the inverse of the left singular vector matrix of an upperbidiagonal matrix to the right hand side; and if icompq = 1, the routine applies the rightsingular vector matrix to the right hand side. The singular vector matrices were generated inthe compact form by ?lalsa.

Input Parameters

INTEGER. Specifies whether the left or the right singularvector matrix is involved. If icompq = 0: left singular vectormatrix is used

icompq

If icompq = 1: right singular vector matrix is used.

INTEGER. The maximum size of the subproblems at thebottom of the computation tree.

smlsiz

1299


INTEGER. The row and column dimensions of the upperbidiagonal matrix.

n

INTEGER. The number of columns of b and bx. Must be atleast 1.

nrhs

REAL for slalsabDOUBLE PRECISION for dlalsaCOMPLEX for clalsaCOMPLEX*16 for zlalsaArray, DIMENSION (ldb, nrhs). Contains the right hand sidesof the least squares problem in rows 1 through m.

INTEGER. The leading dimension of b in the callingsubprogram. Must be at least max(1,max( m, n )).

ldb

INTEGER. The leading dimension of the output array bx.ldbx

REAL for slalsa/clalsauDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, smlsiz). On entry, u contains theleft singular vector matrices of all subproblems at the bottomlevel.

INTEGER, ldu ≥ n. The leading dimension of arrays u, vt,difl, difr, poles, givnum, and z.

ldu

REAL for slalsa/clalsavtDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, smlsiz +1). On entry, containsthe right singular vector matrices of all subproblems at thebottom level.

INTEGER array, DIMENSION ( n ).k

REAL for slalsa/clalsadiflDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, nlvl), where nlvl = int(log2(n/(smlsiz+1))) + 1.

REAL for slalsa/clalsadifrDOUBLE PRECISION for dlalsa/zlalsa

1300


Array, DIMENSION (ldu, 2*nlvl). On entry, difl(*, i)and difr(*, 2i -1) record distances between singular valueson the i-th level and singular values on the (i -1)-th level,and difr(*, 2i) record the normalizing factors of the rightsingular vectors matrices of subproblems on i-th level.

REAL for slalsa/clalsazDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, nlvl . On entry, z(1, i) containsthe components of the deflation- adjusted updating the rowvector for subproblems on the i-th level.

REAL for slalsa/clalsapolesDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, 2*nlvl).On entry, poles(*, 2i-1: 2i) contains the new and oldsingular values involved in the secular equations on the i-thlevel.

INTEGER. Array, DIMENSION ( n ).givptrOn entry, givptr( i ) records the number of Givensrotations performed on the i-th problem on the computationtree.

INTEGER. Array, DIMENSION ( ldgcol, 2*nlvl ). On entry,for each i, givcol(*, 2i-1: 2i) records the locations ofGivens rotations performed on the i-th level on thecomputation tree.

givcol

INTEGER, ldgcol ≥ n. The leading dimension of arraysgivcol and perm.

ldgcol

INTEGER. Array, DIMENSION ( ldgcol, nlvl ). On entry,perm(*, i) records permutations done on the i-th level ofthe computation tree.

perm

REAL for slalsa/clalsagivnumDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION (ldu, 2*nlvl). On entry, givnum(*, 2i-1: 2i) records the c and s values of Givens rotationsperformed on the i-th level on the computation tree.

REAL for slalsa/clalsacDOUBLE PRECISION for dlalsa/zlalsa

1301


Array, DIMENSION ( n ). On entry, if the i-th subproblemis not square, c( i ) contains the c value of a Givens rotationrelated to the right null space of the i-th subproblem.

REAL for slalsa/clalsasDOUBLE PRECISION for dlalsa/zlalsaArray, DIMENSION ( n ). On entry, if the i-th subproblemis not square, s( i ) contains the s-value of a Givens rotationrelated to the right null space of the i-th subproblem.

REAL for slalsaworkDOUBLE PRECISION for dlalsaWorkspace array, DIMENSION at least (n). Used with realflavors only.

REAL for clalsarworkDOUBLE PRECISION for zlalsaWorkspace array, DIMENSION at least max(n,(smlsz+1)*nrhs*3). Used with complex flavors only.

INTEGER.iworkWorkspace array, DIMENSION at least (3n).

Output Parameters

On exit, contains the solution X in rows 1 through n.b

REAL for slalsabxDOUBLE PRECISION for dlalsaCOMPLEX for clalsaCOMPLEX*16 for zlalsaArray, DIMENSION (ldbx, nrhs). On exit, the result ofapplying the left or right singular vector matrix to b.

INTEGER. If info = 0: successful exitinfoIf info = -i < 0, the i-th argument had an illegal value.

1302

?lalsdUses the singular value decomposition of A to solvethe least squares problem.

Syntax

call slalsd( uplo, smlsiz, n, nrhs, d, e, b, ldb, rcond, rank, work, iwork,info )

call dlalsd( uplo, smlsiz, n, nrhs, d, e, b, ldb, rcond, rank, work, iwork,info )

call clalsd( uplo, smlsiz, n, nrhs, d, e, b, ldb, rcond, rank, work, rwork,iwork, info )

call zlalsd( uplo, smlsiz, n, nrhs, d, e, b, ldb, rcond, rank, work, rwork,iwork, info )

Description

The routine uses the singular value decomposition of A to solve the least squares problem offinding X to minimize the Euclidean norm of each column of A*X-B, where A is n-by-n upperbidiagonal, and X and B are n-by-nrhs. The solution X overwrites B.

The singular values of A smaller than rcond times the largest singular value are treated as zeroin solving the least squares problem; in this case a minimum norm solution is returned. Theactual singular values are returned in d in ascending order.

This code makes very mild assumptions about floating point arithmetic. It will work on machineswith a guard digit in add/subtract, or on those binary machines without guard digits whichsubtract like the Cray XMP, Cray YMP, Cray C 90, or Cray 2.

It could conceivably fail on hexadecimal or decimal machines without guard digits, but we knowof none.

Input Parameters

CHARACTER*1.uploIf uplo = 'U', d and e define an upper bidiagonal matrix.If uplo = 'L', d and e define a lower bidiagonal matrix.

INTEGER. The maximum size of the subproblems at thebottom of the computation tree.

smlsiz

1303


INTEGER. The dimension of the bidiagonal matrix.n

n ≥ 0.

INTEGER. The number of columns of B. Must be at least 1.nrhs

REAL for slalsd/clalsddDOUBLE PRECISION for dlalsd/zlalsdArray, DIMENSION (n). On entry, d contains the maindiagonal of the bidiagonal matrix.

REAL for slalsd/clalsdeDOUBLE PRECISION for dlalsd/zlalsdArray, DIMENSION (n-1). Contains the super-diagonal entriesof the bidiagonal matrix. On exit, e is destroyed.

REAL for slalsdbDOUBLE PRECISION for dlalsdCOMPLEX for clalsdCOMPLEX*16 for zlalsdArray, DIMENSION (ldb,nrhs).On input, b contains the right hand sides of the least squaresproblem. On output, b contains the solution X.

INTEGER. The leading dimension of b in the callingsubprogram. Must be at least max(1,n).

ldb

REAL for slalsd/clalsdrcondDOUBLE PRECISION for dlalsd/zlalsdThe singular values of A less than or equal to rcond timesthe largest singular value are treated as zero in solving theleast squares problem. If rcond is negative, machineprecision is used instead. For example, for the least squaresproblem diag(S)*X=B, where diag(S) is a diagonal matrixof singular values, the solution is X(i)=B(i)/S(i) if S(i)is greater than rcond *max(S), and X(i)=0 if S(i) is lessthan or equal to rcond *max(S).

INTEGER. The number of singular values of A greater thanrcond times the largest singular value.

rank

REAL for slalsdworkDOUBLE PRECISION for dlalsdCOMPLEX for clalsdCOMPLEX*16 for zlalsd

1304


Workspace array.DIMENSION for real flavors at least(9n+2n*smlsiz+8n*nlvl+n*nrhs+(smlsiz+1)2),wherenlvl = max(0, int(log2(n/(smlsiz+1))) + 1).DIMENSION for complex flavors is (n*nrhs).

REAL for clalsdrworkDOUBLE PRECISION for zlalsdWorkspace array, used with complex flavors only.DIMENSION at least (9n + 2n*smlsiz + 8n*nlvl +3*mlsiz*nrhs + (smlsiz+1)2),wherenlvl = max(0, int(log2(min(m,n)/(smlsiz+1))) +1).

INTEGER.iworkWorkspace array of DIMENSION (3n*nlvl + 11n).

Output Parameters

On exit, if info = 0, d contains singular values of thebidiagonal matrix.

d

On exit, destroyed.e

On exit, b contains the solution X.b

INTEGER.infoIf info = 0: successful exit.If info = -i < 0, the i-th argument had an illegal value.If info > 0: The algorithm failed to compute a singularvalue while working on the submatrix lying in rows andcolumns info/(n+1) through mod(info,n+1).

1305

?lamrgCreates a permutation list to merge the entries oftwo independently sorted sets into a single setsorted in acsending order.

Syntax

call slamrg( n1, n2, a, strd1, strd2, index )

call dlamrg( n1, n2, a, strd1, strd2, index )

Description

The routine creates a permutation list which will merge the elements of a (which is composedof two independently sorted sets) into a single set which is sorted in ascending order.

Input Parameters

INTEGER. These arguments contain the respective lengthsof the two sorted lists to be merged.

n1, n2

REAL for slamrgaDOUBLE PRECISION for dlamrg.Array, DIMENSION (n1+n2).The first n1 elements of a contain a list of numbers whichare sorted in either ascending or descending order. Likewisefor the final n2 elements.

INTEGER.strd1, strd2These are the strides to be taken through the array a.Allowable strides are 1 and -1. They indicate whether asubset of a is sorted in ascending (strdx = 1) ordescending (strdx = -1) order.

Output Parameters

INTEGER. Array, DIMENSION (n1+n2).indexOn exit, this array will contain a permutation such that ifb(i) = a(index(i)) for i=1, n1+n2, then b will be sortedin ascending order.

1306


?lanegComputes the Sturm count, the number of negativepivots encountered while factoring tridiagonalT-sigma*I = L*D*LT.

Syntax

value = slaneg( n, d, lld, sigma, pivmin, r )

value = dlaneg( n, d, lld, sigma, pivmin, r )

Description

The routine computes the Sturm count, the number of negative pivots encountered whilefactoring tridiagonal T-sigma*I = L*D*LT. This implementation works directly on the factorswithout forming the tridiagonal matrix T. The Sturm count is also the number of eigenvaluesof T less than sigma. This routine is called from ?larb. The current routine does not use thepivmin parameter but rather requires IEEE-754 propagation of infinities and NaNs (NaN standsfor 'Not A Number'). This routine also has no input range restrictions but does require defaultexception handling such that x/0 produces Inf when x is non-zero, and Inf/Inf producesNaN. (For more information see [Marques06]).

Input Parameters

INTEGER. The order of the matrix.n

REAL for slanegdDOUBLE PRECISION for dlanegArray, DIMENSION (n).Contains n diagonal elements of the matrix D.

REAL for slaneglldDOUBLE PRECISION for dlanegArray, DIMENSION (n-1).Contains (n-1) elements L(i)*L(i)*D(i).

REAL for slanegsigmaDOUBLE PRECISION for dlanegShift amount in T-sigma*I = L*D*L**T.

REAL for slanegpivminDOUBLE PRECISION for dlaneg

1307


The minimum pivot in the Sturm sequence. May be usedwhen zero pivots are encountered on non-IEEE-754architectures.

INTEGER.rThe twist index for the twisted factorization that is used forthe negcount.

Output Parameters

INTEGER. The number of negative pivots encountered whilefactoring.

value

?langbReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement of general band matrix.

Syntax

val = slangb( norm, n, kl, ku, ab, ldab, work )

val = dlangb( norm, n, kl, ku, ab, ldab, work )

val = clangb( norm, n, kl, ku, ab, ldab, work )

val = zlangb( norm, n, kl, ku, ab, ldab, work )

Description

The function returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of an n-by-n band matrix A, with kl sub-diagonals andku super-diagonals.

The value val returned by the function is:

val = max(abs(Aij)), if norm = 'M' or 'm'

= norm1(A), if norm = '1' or 'O' or 'o'

= normI(A), if norm = 'I' or 'i'

= normF(A), if norm = 'F', 'f', 'E' or 'e'

1308


where norm1 denotes the 1-norm of a matrix (maximum column sum), normI denotes theinfinity norm of a matrix (maximum row sum) and normF denotes the Frobenius norm of amatrix (square root of sum of squares). Note that max(abs(Aij)) is not a consistent matrixnorm.

Input Parameters

CHARACTER*1. Specifies the vaule to be returned by theroutine as described above.

norm

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?langb is set to zero.

n

INTEGER. The number of sub-diagonals of the matrix A. kl

≥ 0.

kl

INTEGER. The number of super-diagonals of the matrix A.

ku ≥ 0.

ku

REAL for slangbabDOUBLE PRECISION for dlangbCOMPLEX for clangbCOMPLEX*16 for zlangbArray, DIMENSION (ldab,n).The band matrix A, stored in rows 1 to kl+ku+1. The j-thcolumn of A is stored in the j-th column of the array ab asfollows:ab(ku+1+i-j,j) = a(i,j)

for max(1,j-ku) ≤ i ≤ min(n,j+kl).

INTEGER. The leading dimension of the array ab.ldab

ldab ≥ kl+ku+1.

REAL for slangb/clangbworkDOUBLE PRECISION for dlangb/zlangbWorkspace array, DIMENSION (max(1,lwork)), where

lwork ≥ n when norm = 'I'; otherwise, work is notreferenced.

Output Parameters

REAL for slangb/clangbval

1309


DOUBLE PRECISION for dlangb/zlangbValue returned by the function.

?langeReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement of a general rectangular matrix.

Syntax

val = slange( norm, m, n, a, lda, work )

val = dlange( norm, m, n, a, lda, work )

val = clange( norm, m, n, a, lda, work )

val = zlange( norm, m, n, a, lda, work )

Description

The function ?lange returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a real/complex matrix A.







Input Parameters


norm

INTEGER. The number of rows of the matrix A.m

m ≥ 0. When m = 0, ?lange is set to zero.

1310


INTEGER. The number of columns of the matrix A.n

n ≥ 0. When n = 0, ?lange is set to zero.

REAL for slangeaDOUBLE PRECISION for dlangeCOMPLEX for clangeCOMPLEX*16 for zlangeArray, DIMENSION (lda,n).The m-by-n matrix A.

INTEGER. The leading dimension of the array a.lda

lda ≥ max(m,1).

REAL for slange and clange.workDOUBLE PRECISION for dlange and zlange.Workspace array, DIMENSION max(1,lwork), where lwork

≥ m when norm = 'I'; otherwise, work is not referenced.

Output Parameters

REAL for slange/clangevalDOUBLE PRECISION for dlange/zlangeValue returned by the function.

?langtReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement of a general tridiagonal matrix.

Syntax

val = slangt( norm, n, dl, d, du )

val = dlangt( norm, n, dl, d, du )

val = clangt( norm, n, dl, d, du )

val = zlangt( norm, n, dl, d, du )

1311


Description

The routine returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of a real/complex tridiagonal matrix A.



= norm1(A), if norm = '1' or 'O'or 'o'




Input Parameters


norm

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?langt is set to zero.

n

REAL for slangtdl, d, duDOUBLE PRECISION for dlangtCOMPLEX for clangtCOMPLEX*16 for zlangtArrays: dl (n-1), d (n), du (n-1).The array dl contains the (n-1) sub-diagonal elements ofA.The array d contains the diagonal elements of A.The array du contains the (n-1) super-diagonal elements ofA.

Output Parameters

REAL for slangt/clangtvalDOUBLE PRECISION for dlangt/zlangtValue returned by the function.

1312


?lanhsReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement of an upper Hessenberg matrix.

Syntax

val = slanhs( norm, n, a, lda, work )

val = dlanhs( norm, n, a, lda, work )

val = clanhs( norm, n, a, lda, work )

val = zlanhs( norm, n, a, lda, work )

Description

The function ?lanhs returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a Hessenberg matrix A.







Input Parameters


norm


n ≥ 0. When n = 0, ?lanhs is set to zero.

REAL for slanhsaDOUBLE PRECISION for dlanhsCOMPLEX for clanhs

1313


COMPLEX*16 for zlanhsArray, DIMENSION (lda,n). The n-by-n upper Hessenbergmatrix A; the part of A below the first sub-diagonal is notreferenced.


lda ≥ max(n,1).

REAL for slanhs and clanhs.workDOUBLE PRECISION for dlange and zlange.Workspace array, DIMENSION (max(1,lwork)), where

lwork ≥ n when norm = 'I'; otherwise, work is notreferenced.

Output Parameters

REAL for slanhs/clanhsvalDOUBLE PRECISION for dlanhs/zlanhsValue returned by the function.

?lansbReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a symmetric band matrix.

Syntax

val = slansb( norm, uplo, n, k, ab, ldab, work )

val = dlansb( norm, uplo, n, k, ab, ldab, work )

val = clansb( norm, uplo, n, k, ab, ldab, work )

val = zlansb( norm, uplo, n, k, ab, ldab, work )

Description

The function ?lansb returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of an n-by-n real/complex symmetric bandmatrix A, with k super-diagonals.


1314







Input Parameters


norm

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theband matrix A is supplied. If uplo = 'U': upper triangularpart is supplied; If uplo = 'L': lower triangular part issupplied.

INTEGER. The order of the matrix A. n ≥ 0.n

When n = 0, ?lansb is set to zero.

INTEGER. The number of super-diagonals or sub-diagonals

of the band matrix A. k ≥ 0.

k

REAL for slansbabDOUBLE PRECISION for dlansbCOMPLEX for clansbCOMPLEX*16 for zlansbArray, DIMENSION (ldab,n).The upper or lower triangle of the symmetric band matrixA, stored in the first k+1 rows of ab. The j-th column of Ais stored in the j-th column of the array ab as follows:if uplo = 'U', ab(k+1+i-j,j) = a(i,j)

for max(1,j-k) ≤ i≤ j;if uplo = 'L', ab(1+i-j,j) = a(i,j) for

j≤i≤min(n,j+k).


1315


ldab ≥ k+1.

REAL for slansb and clansb.workDOUBLE PRECISION for dlansb and zlansb.Workspace array, DIMENSION (max(1,lwork)), where

lwork ≥ n when norm = 'I' or '1' or 'O'; otherwise,work is not referenced.

Output Parameters

REAL for slansb/clansbvalDOUBLE PRECISION for dlansb/zlansbValue returned by the function.

?lanhbReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a Hermitian band matrix.

Syntax

val = clanhb( norm, uplo, n, k, ab, ldab, work )

val = zlanhb( norm, uplo, n, k, ab, ldab, work )

Description

The routine returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of an n-by-n Hermitian band matrix A, with ksuper-diagonals.






1316



Input Parameters


norm

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theband matrix A is supplied.If uplo = 'U': upper triangular part is supplied;If uplo = 'L': lower triangular part is supplied.

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?lanhb is set to zero.

n

INTEGER. The number of super-diagonals or sub-diagonalsof the band matrix A.

k

k ≥ 0.

COMPLEX for clanhb.abCOMPLEX*16 for zlanhb.Array, DIMENSION (ldaB,n). The upper or lower triangle ofthe Hermitian band matrix A, stored in the first k+1 rowsof ab. The j-th column of A is stored in the j-th column ofthe array ab as follows:if uplo = 'U', ab(k+1+i-j,j) = a(i,j)

for max(1,j-k) ≤ i ≤ j;

if uplo = 'L', ab(1+i-j,j) = a(i,j) for j ≤ i ≤min(n,j+k).Note that the imaginary parts of the diagonal elements neednot be set and are assumed to be zero.

INTEGER. The leading dimension of the array ab. ldab ≥k+1.

ldab

REAL for clanhb.workDOUBLE PRECISION for zlanhb.Workspace array, DIMENSION max(1, lwork), where

1317



Output Parameters

REAL for slanhb/clanhbvalDOUBLE PRECISION for dlanhb/zlanhbValue returned by the function.

?lanspReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a symmetric matrixsupplied in packed form.

Syntax

val = slansp( norm, uplo, n, ap, work )

val = dlansp( norm, uplo, n, ap, work )

val = clansp( norm, uplo, n, ap, work )

val = zlansp( norm, uplo, n, ap, work )

Description

The function ?lansp returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a real/complex symmetric matrix A, suppliedin packed form.






1318



Input Parameters


norm

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is supplied.If uplo = 'U': Upper triangular part of A is suppliedIf uplo = 'L': Lower triangular part of A is supplied.

INTEGER. The order of the matrix A. n ≥ 0. Whenn

n = 0, ?lansp is set to zero.

REAL for slanspapDOUBLE PRECISION for dlanspCOMPLEX for clanspCOMPLEX*16 for zlanspArray, DIMENSION (n(n+1)/2).The upper or lower triangle of the symmetric matrix A,packed columnwise in a linear array. The j-th column of Ais stored in the array ap as follows:

if uplo = 'U', ap(i + (j-1)j/2) = A(i,j) for 1 ≤ i

≤ j;if uplo = 'L', ap(i + (j-1)(2n-j)/2) = A(i,j) for j

≤ i ≤ n.

REAL for slansp and clansp.workDOUBLE PRECISION for dlansp and zlansp.Workspace array, DIMENSION (max(1,lwork)), where


Output Parameters

REAL for slansp/clanspval

1319


DOUBLE PRECISION for dlansp/zlanspValue returned by the function.

?lanhpReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a complex Hermitianmatrix supplied in packed form.

Syntax

val = clanhp( norm, uplo, n, ap, work )

val = zlanhp( norm, uplo, n, ap, work )

Description

The function ?lanhp returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a complex Hermitian matrix A, supplied inpacked form.







Input Parameters


norm

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theHermitian matrix A is supplied.

1320


If uplo = 'U': Upper triangular part of A is suppliedIf uplo = 'L': Lower triangular part of A is supplied.


n ≥ 0. When n = 0, ?lanhp is set to zero.

COMPLEX for clanhp.apCOMPLEX*16 for zlanhp.Array, DIMENSION (n(n+1)/2). The upper or lower triangleof the Hermitian matrix A, packed columnwise in a lineararray. The j-th column of A is stored in the array ap asfollows:


≤ j;if uplo = 'L', ap(i + (j-1)(2n-j)/2) = A(i,j) for j

≤ i ≤ n.

REAL for clanhp.workDOUBLE PRECISION for zlanhp.Workspace array, DIMENSION (max(1,lwork)), where


Output Parameters

REAL for clanhp.valDOUBLE PRECISION for zlanhp.Value returned by the function.

1321


?lanst/?lanhtReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a real symmetric orcomplex Hermitian tridiagonal matrix.

Syntax

val = slanst( norm, n, d, e )

val = dlanst( norm, n, d, e )

val = clanht( norm, n, d, e )

val = zlanht( norm, n, d, e )

Description

The functions ?lanst/?lanht return the value of the 1-norm, or the Frobenius norm, or theinfinity norm, or the element of largest absolute value of a real symmetric or a complex Hermitiantridiagonal matrix A.







Input Parameters


norm


n ≥ 0. When n = 0, ?lanst/?lanht is set to zero.

REAL for slanst/clanhtd

1322


DOUBLE PRECISION for dlanst/zlanhtArray, DIMENSION (n). The diagonal elements of A.

REAL for slansteDOUBLE PRECISION for dlanstCOMPLEX for clanhtCOMPLEX*16 for zlanhtArray, DIMENSION (n-1).The (n-1) sub-diagonal or super-diagonal elements of A.

Output Parameters

REAL for slanst/clanhtvalDOUBLE PRECISION for dlanst/zlanhtValue returned by the function.

?lansyReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a real/complex symmetricmatrix.

Syntax

val = slansy( norm, uplo, n, a, lda, work )

val = dlansy( norm, uplo, n, a, lda, work )

val = clansy( norm, uplo, n, a, lda, work )

val = zlansy( norm, uplo, n, a, lda, work )

Description

The function ?lansy returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a real/complex symmetric matrix A.





1323




Input Parameters


norm

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is to be referenced.= 'U': Upper triangular part of A is referenced.= 'L': Lower triangular part of A is referenced

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?lansy is set to zero.

n

REAL for slansyaDOUBLE PRECISION for dlansyCOMPLEX for clansyCOMPLEX*16 for zlansyArray, DIMENSION (lda,n). The symmetric matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofa contains the upper triangular part of the matrix A, andthe strictly lower triangular part of a is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofa contains the lower triangular part of the matrix A, and thestrictly upper triangular part of a is not referenced.


lda ≥ max(n,1).

REAL for slansy and clansy.workDOUBLE PRECISION for dlansy and zlansy.Workspace array, DIMENSION (max(1,lwork)), where


1324


Output Parameters

REAL for slansy/clansyvalDOUBLE PRECISION for dlansy/zlansyValue returned by the function.

?lanheReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a complex Hermitianmatrix.

Syntax

val = clanhe( norm, uplo, n, a, lda, work )

val = zlanhe( norm, uplo, n, a, lda, work )

Description

The function ?lanhe returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a complex Hermitian matrix A.



= norm1(A), if norm = '1' or 'O'or 'o'




Input Parameters


norm

CHARACTER*1.uplo

1325


Specifies whether the upper or lower triangular part of theHermitian matrix A is to be referenced.= 'U': Upper triangular part of A is referenced.= 'L': Lower triangular part of A is referenced

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?lanhe is set to zero.

n

COMPLEX for clanhe.aCOMPLEX*16 for zlanhe.Array, DIMENSION (lda,n). The Hermitian matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofa contains the upper triangular part of the matrix A, andthe strictly lower triangular part of a is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofa contains the lower triangular part of the matrix A, and thestrictly upper triangular part of a is not referenced.


lda ≥ max(n,1).

REAL for clanhe.workDOUBLE PRECISION for zlanhe.Workspace array, DIMENSION (max(1,lwork)) , where


Output Parameters

REAL for clanhe.valDOUBLE PRECISION for zlanhe.Value returned by the function.

1326


?lantbReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a triangular band matrix.

Syntax

val = slantb( norm, uplo, diag, n, k, ab, ldab, work )

val = dlantb( norm, uplo, diag, n, k, ab, ldab, work )

val = clantb( norm, uplo, diag, n, k, ab, ldab, work )

val = zlantb( norm, uplo, diag, n, k, ab, ldab, work )

Description

The function ?lantb returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of an n-by-n triangular band matrix A, with ( k+ 1 ) diagonals.







Input Parameters


norm

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': Upper triangular= 'L': Lower triangular.

1327


CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': Non-unit triangular= 'U': Unit triangular.

INTEGER. The order of the matrix A. n ≥ 0. When n = 0,?lantb is set to zero.

n

INTEGER. The number of super-diagonals of the matrix A ifuplo = 'U', or the number of sub-diagonals of the matrix

A if uplo = 'L'. k ≥ 0.

k

REAL for slantbabDOUBLE PRECISION for dlantbCOMPLEX for clantbCOMPLEX*16 for zlantbArray, DIMENSION (ldab,n). The upper or lower triangularband matrix A, stored in the first k+1 rows of ab.The j-th column of A is stored in the j-th column of thearray ab as follows:if uplo = 'U', ab(k+1+i-j,j) = a(i,j) for max(1,j-k)

≤ i ≤ j;

if uplo = 'L', ab(1+i-j,j) = a(i,j) for j≤ i≤min(n,j+k).Note that when diag = 'U', the elements of the array abcorresponding to the diagonal elements of the matrix A arenot referenced, but are assumed to be one.


ldab ≥ k+1.

REAL for slantb and clantb.workDOUBLE PRECISION for dlantb and zlantb.Workspace array, DIMENSION (max(1,lwork)), where

lwork ≥ n when norm = 'I' ; otherwise, work is notreferenced.

Output Parameters

REAL for slantb/clantb.valDOUBLE PRECISION for dlantb/zlantb.

1328


Value returned by the function.

?lantpReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a triangular matrixsupplied in packed form.

Syntax

val = slantp( norm, uplo, diag, n, ap, work )

val = dlantp( norm, uplo, diag, n, ap, work )

val = clantp( norm, uplo, diag, n, ap, work )

val = zlantp( norm, uplo, diag, n, ap, work )

Description

The function ?lantp returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a triangular matrix A, supplied in packedform.







Input Parameters


norm

CHARACTER*1.uplo

1329


Specifies whether the matrix A is upper or lower triangular.= 'U': Upper triangular= 'L': Lower triangular.

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': Non-unit triangular= 'U': Unit triangular.


n ≥ 0. When n = 0, ?lantp is set to zero.

REAL for slantpapDOUBLE PRECISION for dlantpCOMPLEX for clantpCOMPLEX*16 for zlantpArray, DIMENSION (n(n+1)/2).The upper or lower triangular matrix A, packed columnwisein a linear array. The j-th column of A is stored in the arrayap as follows:

if uplo = 'U', AP(i + (j-1)j/2) = a(i,j) for 1≤ i≤j;if uplo = 'L', ap(i + (j-1)(2n-j)/2) = a(i,j) for

j≤ i≤ n.Note that when diag = 'U', the elements of the array apcorresponding to the diagonal elements of the matrix A arenot referenced, but are assumed to be one.

REAL for slantp and clantp.workDOUBLE PRECISION for dlantp and zlantp.Workspace array, DIMENSION (max(1,lwork)), where

lwork ≥ n when norm = 'I' ; otherwise, work is notreferenced.

Output Parameters

REAL for slantp/clantp.valDOUBLE PRECISION for dlantp/zlantp.Value returned by the function.

1330


?lantrReturns the value of the 1-norm, or the Frobeniusnorm, or the infinity norm, or the element oflargest absolute value of a trapezoidal or triangularmatrix.

Syntax

val = slantr( norm, uplo, diag, m, n, a, lda, work )

val = dlantr( norm, uplo, diag, m, n, a, lda, work )

val = clantr( norm, uplo, diag, m, n, a, lda, work )

val = zlantr( norm, uplo, diag, m, n, a, lda, work )

Description

The function ?lantr returns the value of the 1-norm, or the Frobenius norm, or the infinitynorm, or the element of largest absolute value of a trapezoidal or triangular matrix A.







Input Parameters


norm

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower trapezoidal.= 'U': Upper trapezoidal= 'L': Lower trapezoidal.

1331


Note that A is triangular instead of trapezoidal if m = n.

CHARACTER*1.diagSpecifies whether or not the matrix A has unit diagonal.= 'N': Non-unit diagonal= 'U': Unit diagonal.

INTEGER. The number of rows of the matrix A. m ≥ 0, and

if uplo = 'U', m ≤ n.

m

When m = 0, ?lantr is set to zero.

INTEGER. The number of columns of the matrix A. n ≥ 0,

and if uplo = 'L', n ≤ m.

n

When n = 0, ?lantr is set to zero.

REAL for slantraDOUBLE PRECISION for dlantrCOMPLEX for clantrCOMPLEX*16 for zlantrArray, DIMENSION (lda,n).The trapezoidal matrix A (A is triangular if m = n).If uplo = 'U', the leading m-by-n upper trapezoidal partof the array a contains the upper trapezoidal matrix, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading m-by-n lower trapezoidal partof the array a contains the lower trapezoidal matrix, andthe strictly upper triangular part of A is not referenced. Notethat when diag = 'U', the diagonal elements of A are notreferenced and are assumed to be one.


lda ≥ max(m,1).

REAL for slantr/clantrp.workDOUBLE PRECISION for dlantr/zlantr.Workspace array, DIMENSION (max(1,lwork)), where

lwork ≥ m when norm = 'I' ; otherwise, work is notreferenced.

1332


Output Parameters

REAL for slantr/clantrp.valDOUBLE PRECISION for dlantr/zlantr.Value returned by the function.

?lanv2Computes the Schur factorization of a real 2-by-2nonsymmetric matrix in standard form.

Syntax

call slanv2( a, b, c, d, rt1r, rt1i, rt2r, rt2i, cs, sn )

call dlanv2( a, b, c, d, rt1r, rt1i, rt2r, rt2i, cs, sn )

Description

The routine computes the Schur factorization of a real 2-by-2 nonsymmetric matrix in standardform:

where either

1. cc = 0 so that aa and dd are real eigenvalues of the matrix, or

2. aa = dd and bb*cc < 0, so that aa ± sqrt(bb*cc) are complex conjugate eigenvalues.

The routine was adjusted to reduce the risk of cancellation errors, when computing real

eigenvalues, and to ensure, if possible, that abs(rt1r) ≥ abs(rt2r).

Input Parameters

REAL for slanv2a, b, c, dDOUBLE PRECISION for dlanv2.On entry, elements of the input matrix.

1333

Output Parameters

On exit, overwritten by the elements of the standardizedSchur form.

a, b, c, d

REAL for slanv2rt1r, rt1i, rt2r, rt2iDOUBLE PRECISION for dlanv2.The real and imaginary parts of the eigenvalues.If the eigenvalues are a complex conjugate pair, rt1i >0.

REAL for slanv2cs, snDOUBLE PRECISION for dlanv2.Parameters of the rotation matrix.

?lapllMeasures the linear dependence of two vectors.

Syntax

call slapll( n, x, incx, Y, incy, ssmin )

call dlapll( n, x, incx, Y, incy, ssmin )

call clapll( n, x, incx, Y, incy, ssmin )

call zlapll( n, x, incx, Y, incy, ssmin )

Description

Given two column vectors x and y of length n, let

A = (x y) be the n-by-2 matrix.

The routine ?lapll first computes the QR factorization of A as A = Q*R and then computes theSVD of the 2-by-2 upper triangular matrix R. The smaller singular value of R is returned inssmin, which is used as the measurement of the linear dependency of the vectors x and y.

Input Parameters

INTEGER. The length of the vectors x and y.n

REAL for slapllxDOUBLE PRECISION for dlapllCOMPLEX for clapll

1334


COMPLEX*16 for zlapllArray, DIMENSION (1+(n-1)incx).On entry, x contains the n-vector x.

REAL for slapllyDOUBLE PRECISION for dlapllCOMPLEX for clapllCOMPLEX*16 for zlapllArray, DIMENSION (1+(n-1)incy).On entry, y contains the n-vector y.

INTEGER. The increment between successive elements ofx; incx > 0.

incx

INTEGER. The increment between successive elements ofy; incy > 0.

incy

Output Parameters

On exit, x is overwritten.x

On exit, y is overwritten.y

REAL for slapll/clapllssminDOUBLE PRECISION for dlapll/zlapllThe smallest singular value of the n-by-2 matrix A = (x y).

?lapmtPerforms a forward or backward permutation ofthe columns of a matrix.

Syntax

call slapmt( forwrd, m, n, x, ldx, k )

call dlapmt( forwrd, m, n, x, ldx, k )

call clapmt( forwrd, m, n, x, ldx, k )

call zlapmt( forwrd, m, n, x, ldx, k )

1335


Description

The routine ?lapmt rearranges the columns of the m-by-n matrix X as specified by thepermutation k(1),k(2),...,k(n) of the integers 1,...,n.

If forwrd = .TRUE., forward permutation:

X(*,k(j)) is moved to X(*,j) for j=1,2,...,n.

If forwrd = .FALSE., backward permutation:

X(*,j) is moved to X(*,k(j)) for j = 1,2,...,n.

Input Parameters

LOGICAL.forwrdIf forwrd = .TRUE., forward permutationIf forwrd = .FALSE., backward permutation

INTEGER. The number of rows of the matrix X. m ≥ 0.m

INTEGER. The number of columns of the matrix X. n ≥ 0.n

REAL for slapmtxDOUBLE PRECISION for dlapmtCOMPLEX for clapmtCOMPLEX*16 for zlapmtArray, DIMENSION (ldx,n). On entry, the m-by-n matrix X.

INTEGER. The leading dimension of the array X, ldx ≥max(1,m).

ldx

INTEGER. Array, DIMENSION (n). On entry, k contains thepermutation vector and is used as internal workspace.

k

Output Parameters

On exit, x contains the permuted matrix X.x

On exit, k is reset to its original value.k

1336


?lapy2Returns sqrt(x2+y2).

Syntax

val = slapy2( x, y )

val = dlapy2( x, y )

Description

The function ?lapy2 returns sqrt(x2+y2), avoiding unnecessary overflow or harmful underflow.

Input Parameters

REAL for slapy2x, yDOUBLE PRECISION for dlapy2Specify the input values x and y.

Output Parameters

REAL for slapy2valDOUBLE PRECISION for dlapy2.Value returned by the function.

?lapy3Returns sqrt(x2+y2+z2).

Syntax

val = slapy3( x, y, z )

val = dlapy3( x, y, z )

Description

The function ?lapy3 returns sqrt(x2+y2+z2), avoiding unnecessary overflow or harmfulunderflow.

1337


Input Parameters

REAL for slapy3x, y, zDOUBLE PRECISION for dlapy3Specify the input values x, y and z.

Output Parameters

REAL for slapy3valDOUBLE PRECISION for dlapy3.Value returned by the function.

?laqgbScales a general band matrix, using row andcolumn scaling factors computed by ?gbequ.

Syntax

call slaqgb( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, equed )

call dlaqgb( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, equed )

call claqgb( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, equed )

call zlaqgb( m, n, kl, ku, ab, ldab, r, c, rowcnd, colcnd, amax, equed )

Description

The routine equilibrates a general m-by-n band matrix A with kl subdiagonals and kusuperdiagonals using the row and column scaling factors in the vectors r and c.

Input Parameters

INTEGER. The number of rows of the matrix A. m ≥ 0.m

INTEGER. The number of columns of the matrix A. n ≥ 0.n

INTEGER. The number of subdiagonals within the band of

A. kl ≥ 0.

kl

INTEGER. The number of superdiagonals within the band of

A. ku ≥ 0.

ku

1338


REAL for slaqgbabDOUBLE PRECISION for dlaqgbCOMPLEX for claqgbCOMPLEX*16 for zlaqgbArray, DIMENSION (ldab,n). On entry, the matrix A in bandstorage, in rows 1 to kl+ku+1. The j-th column of A isstored in the j-th column of the array ab as follows:

ab(ku+1+i-j,j) = A(i,j) for max(1,j-ku) ≤ i ≤min(m,j+kl).


lda ≥ kl+ku+1.

REAL for slaqgb/claqgbamaxDOUBLE PRECISION for dlaqgb/zlaqgbAbsolute value of largest matrix entry.

REAL for slaqgb/claqgbr, cDOUBLE PRECISION for dlaqgb/zlaqgbArrays r (m), c (n). Contain the row and column scale factorsfor A, respectively.

REAL for slaqgb/claqgbrowcndDOUBLE PRECISION for dlaqgb/zlaqgbRatio of the smallest r(i) to the largest r(i).

REAL for slaqgb/claqgbcolcndDOUBLE PRECISION for dlaqgb/zlaqgbRatio of the smallest c(i) to the largest c(i).

Output Parameters

On exit, the equilibrated matrix, in the same storage formatas A.

ab

See equed for the form of the equilibrated matrix.

CHARACTER*1.equedSpecifies the form of equilibration that was done.If equed = 'N': No equilibrationIf equed = 'R': Row equilibration, that is, A has beenpremultiplied by diag(r).If equed = 'C': Column equilibration, that is, A has beenpostmultiplied by diag(c).

1339


If equed = 'B': Both row and column equilibration, thatis, A has been replaced by diag(r)*A*diag(c).

Application Notes

The routine uses internal parameters thresh, large, and small, which have the followingmeaning. thresh is a threshold value used to decide if row or column scaling should be donebased on the ratio of the row or column scaling factors. If rowcnd < thresh, row scaling isdone, and if colcnd < thresh, column scaling is done. large and small are threshold valuesused to decide if row scaling should be done based on the absolute size of the largest matrixelement. If amax > large or amax < small, row scaling is done.

?laqgeScales a general rectangular matrix, using row andcolumn scaling factors computed by ?geequ.

Syntax

call slaqge( m, n, a, lda, r, c, rowcnd, colcnd, amax, equed )

call dlaqge( m, n, a, lda, r, c, rowcnd, colcnd, amax, equed )

call claqge( m, n, a, lda, r, c, rowcnd, colcnd, amax, equed )

call zlaqge( m, n, a, lda, r, c, rowcnd, colcnd, amax, equed )

Description

The routine equilibrates a general m-by-n matrix A using the row and column scaling factors inthe vectors r and c.

Input Parameters


m ≥ 0.


n ≥ 0.

REAL for slaqgeaDOUBLE PRECISION for dlaqgeCOMPLEX for claqge

1340

COMPLEX*16 for zlaqgeArray, DIMENSION (lda,n). On entry, the m-by-n matrix A.


lda ≥ max(m,1).

REAL for slanqge/claqgerDOUBLE PRECISION for dlaqge/zlaqgeArray, DIMENSION (m). The row scale factors for A.

REAL for slanqge/claqgecDOUBLE PRECISION for dlaqge/zlaqgeArray, DIMENSION (n). The column scale factors for A.

REAL for slanqge/claqgerowcndDOUBLE PRECISION for dlaqge/zlaqgeRatio of the smallest r(i) to the largest r(i).

REAL for slanqge/claqgecolcndDOUBLE PRECISION for dlaqge/zlaqgeRatio of the smallest c(i) to the largest c(i).

REAL for slanqge/claqgeamaxDOUBLE PRECISION for dlaqge/zlaqgeAbsolute value of largest matrix entry.

Output Parameters

On exit, the equilibrated matrix.aSee equed for the form of the equilibrated matrix.

CHARACTER*1.equedSpecifies the form of equilibration that was done.If equed = 'N': No equilibrationIf equed = 'R': Row equilibration, that is, A has beenpremultiplied by diag(r).If equed = 'C': Column equilibration, that is, A has beenpostmultiplied by diag(c).If equed = 'B': Both row and column equilibration, thatis, A has been replaced by diag(r)*A*diag(c).

1341


Application Notes

The routine uses internal parameters thresh, large, and small, which have the followingmeaning. thresh is a threshold value used to decide if row or column scaling should be donebased on the ratio of the row or column scaling factors. If rowcnd < thresh, row scaling isdone, and if colcnd < thresh, column scaling is done. large and small are threshold valuesused to decide if row scaling should be done based on the absolute size of the largest matrixelement. If amax > large or amax < small, row scaling is done.

?laqhbScales a Hermetian band matrix, using scalingfactors computed by ?pbequ.

Syntax

call claqhb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

call zlaqhb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

Description

The routine equilibrates a Hermetian band matrix A using the scaling factors in the vector s.

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theband matrix A is stored.If uplo = 'U': upper triangular.If uplo = 'L': lower triangular.


n ≥ 0.

INTEGER. The number of super-diagonals of the matrix A ifuplo = 'U', or the number of sub-diagonals if uplo ='L'.

kd

kd ≥ 0.

COMPLEX for claqhbabCOMPLEX*16 for zlaqhb

1342

Array, DIMENSION (ldab,n). On entry, the upper or lowertriangle of the band matrix A, stored in the first kd+1 rowsof the array. The j-th column of A is stored in the j-thcolumn of the array ab as follows:if uplo = 'U', ab(kd+1+i-j,j) = A(i,j) for

max(1,j-kd) ≤ i ≤ j;

if uplo = 'L', ab(1+i-j,j) = A(i,j) for j ≤ i ≤min(n,j+kd).


ldab ≥ kd+1.

REAL for claqsbscondDOUBLE PRECISION for zlaqsbRatio of the smallest s(i) to the largest s(i).

REAL for claqsbamaxDOUBLE PRECISION for zlaqsbAbsolute value of largest matrix entry.

Output Parameters

On exit, if info = 0, the triangular factor U or L from theCholesky factorization A = U'*U or A = L*L' of the bandmatrix A, in the same storage format as A.

ab

REAL for claqsbsDOUBLE PRECISION for zlaqsbArray, DIMENSION (n). The scale factors for A.

CHARACTER*1.equedSpecifies whether or not equilibration was done.If equed = 'N': No equilibration.If equed = 'Y': Equilibration was done, that is, A has beenreplaced by diag(s)*A*diag(s).

Application Notes

The routine uses internal parameters thresh, large, and small, which have the followingmeaning. thresh is a threshold value used to decide if scaling should be based on the ratio ofthe scaling factors. If scond < thresh, scaling is done.

1343

The values large and small are threshold values used to decide if scaling should be done basedon the absolute size of the largest matrix element. If amax > large or amax < small, scalingis done.

?laqp2Computes a QR factorization with column pivotingof the matrix block.

Syntax

call slaqp2( m, n, offset, a, lda, jpvt, tau, vn1, vn2, work )

call dlaqp2( m, n, offset, a, lda, jpvt, tau, vn1, vn2, work )

call claqp2( m, n, offset, a, lda, jpvt, tau, vn1, vn2, work )

call zlaqp2( m, n, offset, a, lda, jpvt, tau, vn1, vn2, work )

Description

The routine computes a QR factorization with column pivoting of the block A(offset+1:m,1:n).The block A(1:offset,1:n) is accordingly pivoted, but not factorized.

Input Parameters



INTEGER. The number of rows of the matrix A that must be

pivoted but no factorized. offset ≥ 0.

offset

REAL for slaqp2aDOUBLE PRECISION for dlaqp2COMPLEX for claqp2COMPLEX*16 for zlaqp2Array, DIMENSION (lda,n). On entry, the m-by-n matrix A.

INTEGER. The leading dimension of the array a. lda ≥max(1,m).

lda

INTEGER.jpvtArray, DIMENSION (n).

1344

On entry, if jpvt(i) ≠ 0, the i-th column of A is permutedto the front of A*P (a leading column); if jpvt(i) = 0, thei-th column of A is a free column.

REAL for slaqp2/claqp2vn1, vn2DOUBLE PRECISION for dlaqp2/zlaqp2Arrays, DIMENSION (n) each. Contain the vectors with thepartial and exact column norms, respectively.

REAL for slaqp2workDOUBLE PRECISION for dlaqp2COMPLEX for claqp2COMPLEX*16 for zlaqp2 Workspace array, DIMENSION (n).

Output Parameters

On exit, the upper triangle of block A(offset+1:m,1:n) isthe triangular factor obtained; the elements in blockA(offset+1:m,1:n) below the diagonal, together with the

a

array tau, represent the orthogonal matrix Q as a productof elementary reflectors. Block A(1:offset,1:n) has beenaccordingly pivoted, but not factorized.

On exit, if jpvt(i) = k, then the i-th column of A*P wasthe k-th column of A.

jpvt

REAL for slaqp2tauDOUBLE PRECISION for dlaqp2COMPLEX for claqp2COMPLEX*16 for zlaqp2Array, DIMENSION (min(m,n)).The scalar factors of the elementary reflectors.

Contain the vectors with the partial and exact column norms,respectively.

vn1, vn2

1345


?laqpsComputes a step of QR factorization with columnpivoting of a real m-by-n matrix A by using BLASlevel 3.

Syntax

call slaqps( m, n, offset, nb, kb, a, lda, jpvt, tau, vn1, vn2, auxv, f, ldf)

call dlaqps( m, n, offset, nb, kb, a, lda, jpvt, tau, vn1, vn2, auxv, f, ldf)

call claqps( m, n, offset, nb, kb, a, lda, jpvt, tau, vn1, vn2, auxv, f, ldf)

call zlaqps( m, n, offset, nb, kb, a, lda, jpvt, tau, vn1, vn2, auxv, f, ldf)

Description

This routine computes a step of QR factorization with column pivoting of a real m-by-n matrixA by using BLAS level 3. The routine tries to factorize NB columns from A starting from the rowoffset+1, and updates all of the matrix with BLAS level 3 routine ?gemm.

In some cases, due to catastrophic cancellations, ?laqps cannot factorize NB columns. Hence,the actual number of factorized columns is returned in kb.

Block A(1:offset,1:n) is accordingly pivoted, but not factorized.

Input Parameters



INTEGER. The number of rows of A that have been factorizedin previous steps.

offset

INTEGER. The number of columns to factorize.nb

REAL for slaqpsaDOUBLE PRECISION for dlaqpsCOMPLEX for claqpsCOMPLEX*16 for zlaqps

1346


Array, DIMENSION (lda,n).On entry, the m-by-n matrix A.


lda ≥ max(1,m).

INTEGER. Array, DIMENSION (n).jpvtIf jpvt(I) = k then column k of the full matrix A has beenpermuted into position i in AP.

REAL for slaqps/claqpsvn1, vn2DOUBLE PRECISION for dlaqps/zlaqpsArrays, DIMENSION (n) each. Contain the vectors with thepartial and exact column norms, respectively.

REAL for slaqpsauxvDOUBLE PRECISION for dlaqpsCOMPLEX for claqpsCOMPLEX*16 for zlaqpsArray, DIMENSION (nb). Auxiliary vector.

REAL for slaqpsfDOUBLE PRECISION for dlaqpsCOMPLEX for claqpsCOMPLEX*16 for zlaqpsArray, DIMENSION (ldf,nb). Matrix F' = L*Y'*A.

INTEGER. The leading dimension of the array f.ldf

ldf ≥ max(1,n).

Output Parameters

INTEGER. The number of columns actually factorized.kb

On exit, block A(offset+1:m,1:kb) is the triangular factorobtained and block A(1:offset,1:n) has been accordinglypivoted, but no factorized. The rest of the matrix, blockA(offset+1:m,kb+1:n) has been updated.

a

INTEGER array, DIMENSION (n). If jpvt(I) = k thencolumn k of the full matrix A has been permuted into positioni in AP.

jpvt

REAL for slaqpstauDOUBLE PRECISION for dlaqps

1347


COMPLEX for claqpsCOMPLEX*16 for zlaqpsArray, DIMENSION (kb). The scalar factors of the elementaryreflectors.

The vectors with the partial and exact column norms,respectively.

vn1, vn2

Auxiliary vector.auxv

Matrix F' = L*Y'*A.f

?laqr0Computes the eigenvalues of a Hessenberg matrix,and optionally the marixes from the Schurdecomposition.

Syntax

call slaqr0( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,work, lwork, info )

call dlaqr0( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,work, lwork, info )

call claqr0( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, work,lwork, info )

call zlaqr0( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, work,lwork, info )

Description

This routine computes the eigenvalues of a Hessenberg matrix H, and, optionally, the matricesT and Z from the Schur decomposition H=Z*T*ZH, where T is an upper quasi-triangular/triangularmatrix (the Schur form), and Z is the orthogonal/unitary matrix of Schur vectors.

Optionally Z may be postmultiplied into an input orthogonal/unitary matrix Q so that this routinecan give the Schur factorization of a matrix A which has been reduced to the Hessenberg formH by the orthogonal/unitary matrix Q: A = Q*H*QH = (QZ)*H*(QZ)H.

Input Parameters

LOGICAL.wantt

1348


If wantt = .TRUE., the full Schur form T is required;If wantt = .FALSE., only eigenvalues are required.


INTEGER. The order of the Hessenberg matrix H. (n ≥ 0).n

INTEGER.ilo, ihiIt is assumed that H is already upper triangular in rows andcolumns 1:ilo-1 and ihi+1:n, and if ilo > 1 thenH(ilo, ilo-1) = 0.ilo and ihi are normally set by a previous call to cgebal,and then passed to cgehrd when the matrix output bycgebal is reduced to Hessenberg form. Otherwise, ilo andihi should be set to 1 and n, respectively.

If n > 0, then 1 ≤ ilo ≤ ihi ≤ n.If n=0, then ilo=1 and ihi=0

REAL for slaqr0hDOUBLE PRECISION for dlaqr0COMPLEX for claqr0COMPLEX*16 for zlaqr0.Array, DIMENSION (ldh, n), contains the upper Hessenbergmatrix H.

INTEGER. The leading dimension of the array h. ldh ≥max(1, n).

ldh

INTEGER. Specify the rows of Z to which transformations

must be applied if wantz is .TRUE., 1 ≤ iloz ≤ ilo;

ihi ≤ ihiz ≤ n.

iloz, ihiz

REAL for slaqr0zDOUBLE PRECISION for dlaqr0COMPLEX for claqr0COMPLEX*16 for zlaqr0.Array, DIMENSION (ldz, ihi), contains the matrix Z ifwantz is .TRUE.. If wantz is .FALSE., z is not referenced.

1349


INTEGER. The leading dimension of the array z.ldz

If wantz is .TRUE., then ldz ≥ max(1, ihiz). Otherwise,

ldz ≥ 1.

REAL for slaqr0workDOUBLE PRECISION for dlaqr0COMPLEX for claqr0COMPLEX*16 for zlaqr0.Workspace array with dimension lwork.


lwork ≥ max(1,n) is sufficient, but for the optimalperformance a greater workspace may be required, typicallyas large as 6*n.It is recommended to use the worlspace query to determinethe optimal workspace size. If lwork=-1,then the routineperforms a workspace query: it estimates the optimalworkspace size for the given values of the input parametersn, ilo, and ihi. The estimate is returned in work(1). Noerror messages related to the lwork is issued by xerbla.Neither H nor Z are accessed.

Output Parameters

If info=0 , and wantt is .TRUE., then h contains the upperquasi-triangular/triangular matrix T from the Schurdecomposition (the Schur form).

h

If info=0 , and wantt is .FALSE., then the contents of hare unspecified on exit.(The output values of h when info > 0 are given underthe description of the info parameter below.)The routine may explicitly set h(i,j) for i>j andj=1,2,...ilo-1 or j=ihi+1, ihi+2,...n.

On exit work(1) contains the minimum value of lworkrequired for optimum performance.

work(1)

COMPLEX for claqr0wCOMPLEX*16 for zlaqr0.

1350


Arrays, DIMENSION(n). The computed eigenvalues ofh(ilo:ihi, ilo:ihi) are stored in w(ilo:ihi). If wantt is.TRUE., then the eigenvalues are stored in the same orderas on the diagonal of the Schur form returned in h, withw(i) = h(i,i).

REAL for slaqr0wr, wiDOUBLE PRECISION for dlaqr0Arrays, DIMENSION(ihi) each. The real and imaginary parts,respectively, of the computed eigenvalues of h(ilo:ihi,ilo:ihi) are stored in the wr(ilo:ihi) and wi(ilo:ihi).If two eigenvalues are computed as a complex conjugatepair, they are stored in consecutive elements of wr and wi,say the i-th and (i+1)-th, with wi(i)> 0 and wi(i+1) <0. If wantt is .TRUE. , then the eigenvalues are stored inthe same order as on the diagonal of the Schur formreturned in h, with wr(i) = h(i,i), and ifh(i:i+1,i:i+1)is a 2-by-2 diagonal block, thenwi(i)=sqrt(-h(i+1,i)*h(i,i+1)).

If wantz is .TRUE., then z(ilo:ihi, iloz:ihiz) isreplaced by z(ilo:ihi, iloz:ihiz)*U, where U is theorthogonal/unitary Schur factor of h(ilo:ihi, ilo:ihi).

z

If wantz is .FALSE., z is not referenced.(The output values of z when info > 0 are given underthe description of the info parameter below.)

INTEGER.info= 0: the execution is successful.> 0: if info = i, then the routine failed to compute all theeigenvalues. Elements 1:ilo-1 and i+1:n of wr and wicontain those eigenvalues which have been successfullycomputed.> 0: if wantt is .FALSE., then the remaining unconvergedeigenvalues are the eigenvalues of the upper Hessenbergmatrix rows and columns ilo through info of the finaloutput value of h.

1351

> 0: if wantt is .TRUE., then (initial value of h)*U =U*(final value of h, where U is an orthogonal/unitarymatrix. The final value of h is upper Hessenberg andquasi-triangular/triangular in rows and columns info+1through ihi.> 0: if wantz is .TRUE., then (final value of z(ilo:ihi,iloz:ihiz))=(initial value of z(ilo:ihi, iloz:ihiz)*U,where U is the orthogonal/unitary matrix in the previousexpression (regardless of the value of wantt).> 0: if wantz is .FALSE., then z is not accessed.

?laqr1Sets a scalar multiple of the first column of theproduct of 2-by-2 or 3-by-3 matrix H and specifiedshifts.

Syntax

call slaqr1( n, h, ldh, sr1, si1, sr2, si2, v )

call dlaqr1( n, h, ldh, sr1, si1, sr2, si2, v )

call claqr1( n, h, ldh, s1, s2, v )

call zlaqr1( n, h, ldh, s1, s2, v )

Description

Given a 2-by-2 or 3-by-3 matrix H, this routine sets v to a scalar multiple of the first columnof the product

K = (H - s1*I)*(H - s2*I), or K = (H - (sr1 + i*si1)*I)*(H - (sr2 + i*si2)*I)

scaling to avoid overflows and most underflows.

It is assumed that either 1) sr1 = sr2 and si1 = -si2, or 2) si1 = si2 = 0.

This is useful for starting double implicit shift bulges in the QR algorithm.

Input Parameters

INTEGER.nThe order of the matrix H. n must be equal to 2 or 3.

1352


REAL for slaqr1sr1, si2, sr2, si2DOUBLE PRECISION for dlaqr1Shift values that define K in the formula above.

COMPLEX for claqr1s1, s2COMPLEX*16 for zlaqr1.Shift values that define K in the formula above.

REAL for slaqr1hDOUBLE PRECISION for dlaqr1COMPLEX for claqr1COMPLEX*16 for zlaqr1.Array, DIMENSION (ldh, n), contains 2-by-2 or 3-by-3 matrixH in the formula above.

INTEGER.ldhThe leading dimension of the array h just as declared in the

calling routine. ldh ≥ n.

Output Parameters

REAL for slaqr1vDOUBLE PRECISION for dlaqr1COMPLEX for claqr1COMPLEX*16 for zlaqr1.Array with dimension (n).A scalar multiple of the first column of the matrix K in theformula above.

1353


?laqr2Performs the orthogonal/unitary similaritytransformation of a Hessenberg matrix to detectand deflate fully converged eigenvalues from atrailing principal submatrix (aggresive earlydeflation).

Syntax

call slaqr2( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sr, si, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call dlaqr2( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sr, si, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call claqr2( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sh, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call zlaqr2( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sh, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

Description

This routine accepts as input an upper Hessenberg matrix H and performs an orthogonal/unitarysimilarity transformation designed to detect and deflate fully converged eigenvalues from atrailing principal submatrix. On output H has been overwritten by a new Hessenberg matrixthat is a perturbation of an orthogonal/unitary similarity transformation of H. It is to be hopedthat the final version of H has many zero subdiagonal entries.

This subroutine is identical to ?laqr3 except that it avoids recursion by calling ?lahqr insteadof ?laqr4.

Input Parameters

LOGICAL.wanttIf wantt = .TRUE., then the Hessenberg matrix H is fullyupdated so that the quasi-triangular/triangular Schur factormay be computed (in cooperation with the callingsubroutine).If wantt = .FALSE., then only enough of H is updated topreserve the eigenvalues.

LOGICAL.wantz

1354


If wantz = .TRUE., then the orthogonal/unitary matrix Zis updated so that the orthogonal/unitary Schur factor maybe computed (in cooperation with the calling subroutine).If wantz = .FALSE., then Z is not referenced.

INTEGER. The order of the Hessenberg matrix H and (ifwantz = .TRUE.) the order of the orthogonal/unitary matrixZ. .

n

INTEGER.ktopIt is assumed that either ktop=1 or h(ktop,ktop-1)=0.ktop and kbot together determine an isolated block alongthe diagonal of the Hessenberg matrix.

INTEGER.kbotIt is assumed without a check that either kbot=n orh(kbot+1,kbot)=0. ktop and kbot together determine anisolated block along the diagonal of the Hessenberg matrix.

INTEGER.nw

Size of the deflation window. 1 ≤ nw ≤ (kbot-ktop+1).

REAL for slaqr2hDOUBLE PRECISION for dlaqr2COMPLEX for claqr2COMPLEX*16 for zlaqr2.Array, DIMENSION (ldh, n), on input the initial n-by-nsection of h stores the Hessenberg matrix H undergoingaggressive early deflation.

INTEGER. The leading dimension of the array h just as

declared in the calling subroutine. ldh≥n.

ldh


must be applied if wantz is .TRUE.. 1 ≤ iloz ≤ ihiz ≤n.

iloz, ihiz

REAL for slaqr2zDOUBLE PRECISION for dlaqr2COMPLEX for claqr2COMPLEX*16 for zlaqr2.Array, DIMENSION (ldz, ihi), contains the matrix Z ifwantz is .TRUE.. If wantz is .FALSE., then z is notreferenced.

1355


INTEGER. The leading dimension of the array z just as

declared in the calling subroutine. ldz ≥ 1.

ldz

REAL for slaqr2vDOUBLE PRECISION for dlaqr2COMPLEX for claqr2COMPLEX*16 for zlaqr2.Workspace array with dimension (ldv, nw). An nw-by-nwwork array.

INTEGER. The leading dimension of the array v just as

declared in the calling subroutine. ldv ≥ nw.

ldv

INTEGER. The number of column of t. nh ≥ nw.nh

REAL for slaqr2tDOUBLE PRECISION for dlaqr2COMPLEX for claqr2COMPLEX*16 for zlaqr2.Workspace array with dimension (ldt, nw).

INTEGER. The leading dimension of the array t just as

declared in the calling subroutine. ldt≥nw.

ldt

INTEGER. The number of rows of work array wv available

for workspace. nv≥nw.

nv

REAL for slaqr2wvDOUBLE PRECISION for dlaqr2COMPLEX for claqr2COMPLEX*16 for zlaqr2.Workspace array with dimension (ldwv, nw).

INTEGER. The leading dimension of the array wv just as

declared in the calling subroutine. ldwv≥nw.

ldwv



1356


lwork=2*nw) is sufficient, but for the optimal performancea greater workspace may be required.If lwork=-1,then the routine performs a workspace query:it estimates the optimal workspace size for the given valuesof the input parameters n, nw, ktop, and kbot. The estimateis returned in work(1). No error messages related to thelwork is issued by xerbla. Neither H nor Z are accessed.

Output Parameters

On output h has been transformed by an orthogonal/unitarysimilarity transformation, perturbed, and the returned toHessenberg form that (it is to be hoped) has some zerosubdiagonal entries.

h

On exit work(1) is set to an estimate of the optimal valueof lwork for the given values of the input parameters n,nw, ktop, and kbot.

work(1)

If wantz is .TRUE., then the orthogonal/unitary similaritytransformation is accumulated into z(iloz:ihiz, ilo:ihi)from the right.

z

If wantz is .FALSE., then z is unreferenced.

INTEGER. The number of converged eigenvalues uncoveredby the routine.

nd

INTEGER. The number of unconverged, that is approximateeigenvalues returned in sr, si or in sh that may be usedas shifts by the calling subroutine.

ns

COMPLEX for claqr2shCOMPLEX*16 for zlaqr2.Arrays, DIMENSION (kbot).The approximate eigenvalues that may be used for shiftsare stored in the sh(kbot-nd-ns+1)through thesh(kbot-nd).The converged eigenvalues are stored in thesh(kbot-nd+1)through the sh(kbot).

REAL for slaqr2sr, siDOUBLE PRECISION for dlaqr2Arrays, DIMENSION (kbot) each.

1357


The real and imaginary parts of the approximate eigenvaluesthat may be used for shifts are stored in thesr(kbot-nd-ns+1)through the sr(kbot-nd), andsi(kbot-nd-ns+1) through the si(kbot-nd), respectively.The real and imaginary parts of converged eigenvalues arestored in the sr(kbot-nd+1)through the sr(kbot), andsi(kbot-nd+1) through the si(kbot), respectively.

?laqr3Performs the orthogonal/unitary similaritytransformation of a Hessenberg matrix to detectand deflate fully converged eigenvalues from atrailing principal submatrix (aggresive earlydeflation).

Syntax

call slaqr3( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sr, si, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call dlaqr3( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sr, si, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call claqr3( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sh, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

call zlaqr3( wantt, wantz, n, ktop, kbot, nw, h, ldh, iloz, ihiz, z, ldz, ns,nd, sh, v, ldv, nh, t, ldt, nv, wv, ldwv, work, lwork )

Description

This routine accepts as input an upper Hessenberg matrix H and performs an orthogonal/unitarysimilarity transformation designed to detect and deflate fully converged eigenvalues from atrailing principal submatrix. On output H has been overwritten by a new Hessenberg matrixthat is a perturbation of an orthogonal/unitary similarity transformation of H. It is to be hopedthat the final version of H has many zero subdiagonal entries.

Input Parameters

LOGICAL.wantt

1358


If wantt = .TRUE., then the Hessenberg matrix H is fullyupdated so that the quasi-triangular/triangular Schur factormay be computed (in cooperation with the callingsubroutine).If wantt = .FALSE., then only enough of H is updated topreserve the eigenvalues.

LOGICAL.wantzIf wantz = .TRUE., then the orthogonal/unitary matrix Zis updated so that the orthogonal/unitary Schur factor maybe computed (in cooperation with the calling subroutine).If wantz = .FALSE., then Z is not referenced.

INTEGER. The order of the Hessenberg matrix H and (ifwantz = .TRUE.) the order of the orthogonal/unitary matrixZ.

n

INTEGER.ktopIt is assumed that either ktop=1 or h(ktop,ktop-1)=0.ktop and kbot together determine an isolated block alongthe diagonal of the Hessenberg matrix.

INTEGER.kbotIt is assumed without a check that either kbot=n orh(kbot+1,kbot)=0. ktop and kbot together determine anisolated block along the diagonal of the Hessenberg matrix.

INTEGER.nw

Size of the deflation window. 1≤nw≤(kbot-ktop+1).

REAL for slaqr3hDOUBLE PRECISION for dlaqr3COMPLEX for claqr3COMPLEX*16 for zlaqr3.Array, DIMENSION (ldh, n), on input the initial n-by-nsection of h stores the Hessenberg matrix H undergoingaggressive early deflation.


declared in the calling subroutine. ldh≥n.

ldh


must be applied if wantz is .TRUE.. 1≤iloz≤ihiz≤n.

iloz, ihiz

1359




declared in the calling subroutine. ldz≥1.

ldz

REAL for slaqr3vDOUBLE PRECISION for dlaqr3COMPLEX for claqr3COMPLEX*16 for zlaqr3.Workspace array with dimension (ldv, nw). An nw-by-nwwork array.


declared in the calling subroutine. ldv≥nw.

ldv

INTEGER. The number of column of t. nh≥nw.nh

REAL for slaqr3tDOUBLE PRECISION for dlaqr3COMPLEX for claqr3COMPLEX*16 for zlaqr3.Workspace array with dimension (ldt, nw).

INTEGER. The leading dimension of the array t just as

declared in the calling subroutine. ldt≥nw.

ldt

INTEGER. The number of rows of work array wv available

for workspace. nv≥nw.

nv

REAL for slaqr3wvDOUBLE PRECISION for dlaqr3COMPLEX for claqr3COMPLEX*16 for zlaqr3.Workspace array with dimension (ldwv, nw).


declared in the calling subroutine. ldwv≥nw.

ldwv

1360



INTEGER. The dimension of the array work.lworklwork=2*nw) is sufficient, but for the optimal performancea greater workspace may be required.If lwork=-1,then the routine performs a workspace query:it estimates the optimal workspace size for the given valuesof the input parameters n, nw, ktop, and kbot. The estimateis returned in work(1). No error messages related to thelwork is issued by xerbla. Neither H nor Z are accessed.

Output Parameters

On output h has been transformed by an orthogonal/unitarysimilarity transformation, perturbed, and the returned toHessenberg form that (it is to be hoped) has some zerosubdiagonal entries.

h

On exit work(1) is set to an estimate of the optimal valueof lwork for the given values of the input parameters n,nw, ktop, and kbot.

work(1)

If wantz is .TRUE., then the orthogonal/unitary similaritytransformation is accumulated into z(iloz:ihiz, ilo:ihi)from the right.

z


INTEGER. The number of converged eigenvalues uncoveredby the routine.

nd

INTEGER. The number of unconverged, that is approximateeigenvalues returned in sr, si or in sh that may be usedas shifts by the calling subroutine.

ns

COMPLEX for claqr3shCOMPLEX*16 for zlaqr3.Arrays, DIMENSION (kbot).

1361


The approximate eigenvalues that may be used for shiftsare stored in the sh(kbot-nd-ns+1)through thesh(kbot-nd).The converged eigenvalues are stored in thesh(kbot-nd+1)through the sh(kbot).

REAL for slaqr3sr, siDOUBLE PRECISION for dlaqr3Arrays, DIMENSION (kbot) each.The real and imaginary parts of the approximate eigenvaluesthat may be used for shifts are stored in thesr(kbot-nd-ns+1)through the sr(kbot-nd), andsi(kbot-nd-ns+1) through the si(kbot-nd), respectively.The real and imaginary parts of converged eigenvalues arestored in the sr(kbot-nd+1)through the sr(kbot), andsi(kbot-nd+1) through the si(kbot), respectively.

?laqr4Computes the eigenvalues of a Hessenberg matrix,and optionally the marices from the Schurdecomposition.

Syntax

call slaqr4( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,work, lwork, info )

call dlaqr4( wantt, wantz, n, ilo, ihi, h, ldh, wr, wi, iloz, ihiz, z, ldz,work, lwork, info )

call claqr4( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, work,lwork, info )

call zlaqr4( wantt, wantz, n, ilo, ihi, h, ldh, w, iloz, ihiz, z, ldz, work,lwork, info )

Description

This routine computes the eigenvalues of a Hessenberg matrix H, and, optionally, the matricesT and Z from the Schur decomposition H=Z*T*ZH, where T is an upper quasi-triangular/triangularmatrix (the Schur form), and Z is the orthogonal/unitary matrix of Schur vectors.

1362


Optionally Z may be postmultiplied into an input orthogonal/unitary matrix Q so that this routinecan give the Schur factorization of a matrix A which has been reduced to the Hessenberg formH by the orthogonal/unitary matrix Q: A = Q*H*QH = (QZ)*H*(QZ)H.

This routine implements one level of recursion for ?laqr0. It is a complete implementation ofthe small bulge multi-shift QR algorithm. It may be called by ?laqr0 and, for large enoughdeflation window size, it may be called by ?laqr3. This routine is identical to ?laqr0 exceptthat it calls ?laqr2 instead of ?laqr3.

Input Parameters

LOGICAL.wanttIf wantt = .TRUE., the full Schur form T is required;If wantt = .FALSE., only eigenvalues are required.


INTEGER. The order of the Hessenberg matrix H. (n ≥ 0).n

INTEGER.ilo, ihiIt is assumed that H is already upper triangular in rows andcolumns 1:ilo-1 and ihi+1:n, and if ilo > 1 thenh(ilo, ilo-1) = 0.ilo and ihi are normally set by a previous call to cgebal,and then passed to cgehrd when the matrix output bycgebal is reduced to Hessenberg form. Otherwise, ilo andihi should be set to 1 and n, respectively.

If n > 0, then 1 ≤ ilo ≤ ihi ≤ n.If n=0, then ilo=1 and ihi=0

REAL for slaqr4hDOUBLE PRECISION for dlaqr4COMPLEX for claqr4COMPLEX*16 for zlaqr4.Array, DIMENSION (ldh, n), contains the upper Hessenbergmatrix H.

INTEGER. The leading dimension of the array h. ldh ≥max(1, n).

ldh

1363



must be applied if wantz is .TRUE., 1 ≤ iloz ≤ ilo;

ihi ≤ ihiz ≤ n.

iloz, ihiz

REAL for slaqr4zDOUBLE PRECISION for dlaqr4COMPLEX for claqr4COMPLEX*16 for zlaqr4.Array, DIMENSION (ldz, ihi), contains the matrix Z ifwantz is .TRUE.. If wantz is .FALSE., z is not referenced.

INTEGER. The leading dimension of the array z.ldz

If wantz is .TRUE., then ldz ≥ max(1, ihiz). Otherwise,

ldz ≥ 1.



lwork ≥ max(1,n) is sufficient, but for the optimalperformance a greater workspace may be required, typicallyas large as 6*n.It is recommended to use the worlspace query to determinethe optimal workspace size. If lwork=-1,then the routineperforms a workspace query: it estimates the optimalworkspace size for the given values of the input parametersn, ilo, and ihi. The estimate is returned in work(1). Noerror messages related to the lwork is issued by xerbla.Neither H nor Z are accessed.

Output Parameters

If info=0 , and wantt is .TRUE., then h contains the upperquasi-triangular/triangular matrix T from the Schurdecomposition (the Schur form).

h

If info=0 , and wantt is .FALSE., then the contents of hare unspecified on exit.

1364


(The output values of h when info > 0 are given underthe description of the info parameter below.)The routines may explicitly set h(i,j) for i>j andj=1,2,...ilo-1 or j=ihi+1, ihi+2,...n.


work(1)

COMPLEX for claqr4wCOMPLEX*16 for zlaqr4.Arrays, DIMENSION(n). The computed eigenvalues ofh(ilo:ihi, ilo:ihi) are stored in w(ilo:ihi). If wantt is.TRUE., then the eigenvalues are stored in the same orderas on the diagonal of the Schur form returned in h, withw(i) = h(i,i).

REAL for slaqr4wr, wiDOUBLE PRECISION for dlaqr4Arrays, DIMENSION(ihi) each. The real and imaginary parts,respectively, of the computed eigenvalues of h(ilo:ihi,ilo:ihi) are stored in the wr(ilo:ihi) and wi(ilo:ihi).If two eigenvalues are computed as a complex conjugatepair, they are stored in consecutive elements of wr and wi,say the i-th and (i+1)-th, with wi(i)> 0 and wi(i+1) <0. If wantt is .TRUE. , then the eigenvalues are stored inthe same order as on the diagonal of the Schur formreturned in h, with wr(i) = h(i,i), and ifh(i:i+1,i:i+1)is a 2-by-2 diagonal block, thenwi(i)=sqrt(-h(i+1,i)*h(i,i+1)).

If wantz is .TRUE., then z(ilo:ihi, iloz:ihiz) isreplaced by z(ilo:ihi, iloz:ihiz)*U, where U is theorthogonal/unitary Schur factor of h(ilo:ihi, ilo:ihi).

z

If wantz is .FALSE., z is not referenced.(The output values of z when info > 0 are given underthe description of the info parameter below.)

INTEGER.info= 0: the execution is successful.

1365

> 0: if info = i, then the routine failed to compute all theeigenvalues. Elements 1:ilo-1 and i+1:n of wr and wicontain those eigenvalues which have been successfullycomputed.> 0: if wantt is .FALSE., then the remaining unconvergedeigenvalues are the eigenvalues of the upper Hessenbergmatrix rows and columns ilo through info of the finaloutput value of h.> 0: if wantt is .TRUE., then (initial value of h)*U =U*(final value of h, where U is an orthogonal/unitarymatrix. The final value of h is upper Hessenberg andquasi-triangular/triangular in rows and columns info+1through ihi.> 0: if wantz is .TRUE., then (final value of z(ilo:ihi,iloz:ihiz))=(initial value of z(ilo:ihi, iloz:ihiz)*U,where U is the orthogonal/unitary matrix in the previousexpression (regardless of the value of wantt).> 0: if wantz is .FALSE., then z is not accessed.

?laqr5Performs a single small-bulge multi-shift QR sweep.

Syntax

call slaqr5( wantt, wantz, kacc22, n, ktop, kbot, nshfts, sr, si, h, ldh,iloz, ihiz, z, ldz, v, ldv, u, ldu, nv, wv, ldwv, nh, wh, ldwh )

call dlaqr5( wantt, wantz, kacc22, n, ktop, kbot, nshfts, sr, si, h, ldh,iloz, ihiz, z, ldz, v, ldv, u, ldu, nv, wv, ldwv, nh, wh, ldwh )

call claqr5( wantt, wantz, kacc22, n, ktop, kbot, nshfts, s, h, ldh, iloz,ihiz, z, ldz, v, ldv, u, ldu, nv, wv, ldwv, nh, wh, ldwh )

call zlaqr5( wantt, wantz, kacc22, n, ktop, kbot, nshfts, s, h, ldh, iloz,ihiz, z, ldz, v, ldv, u, ldu, nv, wv, ldwv, nh, wh, ldwh )

Description

This auxiliary routine called by ?laqr0 performs a single small-bulge multi-shift QR sweep.

1366


Input Parameters

LOGICAL.wanttwantt = .TRUE. if the quasi-triangular/triangular Schurfactor is computed.wantt is set to .FALSE. otherwise.

LOGICAL.wantzwantz = .TRUE. if the orthogonal/unitary Schur factor iscomputed.wantz is set to .FALSE. otherwise.

INTEGER. Possible values are 0, 1, or 2.kacc22Specifies the computation mode of far-from-diagonalorthogonal updates.= 0: the routine does not accumulate reflections and doesnot use matrix-matrix multiply to update far-from-diagonalmatrix entries.= 1: the routine accumulates reflections and usesmatrix-matrix multiply to update the far-from-diagonalmatrix entries.= 2: the routine accumulates reflections, uses matrix-matrixmultiply to update the far-from-diagonal matrix entries, andtakes advantage of 2-by-2 block structure during matrixmultiplies.

INTEGER. The order of the Hessenberg matrix H upon whichthe routine operates.

n

INTEGER.ktop, kbotIt is assumed without a check that either ktop=1 orh(ktop,ktop-1)=0, and either kbot=n orh(kbot+1,kbot)=0.

INTEGER.nshftsNumber of simultaneous shifts, must be positive and even.

REAL for slaqr5sr, siDOUBLE PRECISION for dlaqr5Arrays, DIMENSION (nshfts) each.sr contains the real parts and si contains theimaginary parts of the nshfts shifts of origin thatdefine the multi-shift QR sweep.

1367


COMPLEX for claqr5sCOMPLEX*16 for zlaqr5.Arrays, DIMENSION (nshfts).s contains the shifts of origin that define the multi-shiftQR sweep.REAL for slaqr5hDOUBLE PRECISION for dlaqr5COMPLEX for claqr5COMPLEX*16 for zlaqr5.Array, DIMENSION (ldh, n), on input contains theHessenberg matrix.


declared in the calling routine. ldh ≥ max(1, n).

ldh


must be applied if wantz is .TRUE.. 1 ≤ iloz ≤ ihiz ≤n.

iloz, ihiz



declared in the calling routine. ldz ≥ n.

ldz

REAL for slaqr5vDOUBLE PRECISION for dlaqr5COMPLEX for claqr5COMPLEX*16 for zlaqr5.Workspace array with dimension (ldv, nshfts/2).


declared in the calling routine. ldv ≥ 3.

ldv

REAL for slaqr5uDOUBLE PRECISION for dlaqr5COMPLEX for claqr5

1368


COMPLEX*16 for zlaqr5.Workspace array with dimension (ldu, 3*nshfts-3).

INTEGER. The leading dimension of the array u just as

declared in the calling routine. ldu ≥ 3*nshfts-3.

ldu

INTEGER. The number of column in the array wh available

for workspace. nh ≥ 1.

nh

REAL for slaqr5whDOUBLE PRECISION for dlaqr5COMPLEX for claqr5COMPLEX*16 for zlaqr5.Workspace array with dimension (ldwh, nh)

INTEGER. The leading dimension of the array wh just as

declared in the calling routine. ldwh ≥ 3*nshfts-3

ldwh

INTEGER. The number of rows of the array wv available for

workspace. nv ≥ 1.

nv

REAL for slaqr5wvDOUBLE PRECISION for dlaqr5COMPLEX for claqr5COMPLEX*16 for zlaqr5.Workspace array with dimension (ldwv, 3*nshfts-3).


declared in the calling routine. ldwv ≥ nv.

ldwv

Output Parameters

On output a multi-shift QR Sweep with shiftssr(j)+i*si(j) or s(j) is applied to the isolated diagonalblock in rows and columns ktop through kbot .

h

If wantz is .TRUE., then the QR Sweep orthogonal/unitarysimilarity transformation is accumulated into z(iloz:ihiz,ilo:ihi) from the right.

z


1369


?laqsbScales a symmetric band matrix, using scalingfactors computed by ?pbequ.

Syntax

call slaqsb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

call dlaqsb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

call claqsb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

call zlaqsb( uplo, n, kd, ab, ldab, s, scond, amax, equed )

Description

The routine equilibrates a symmetric band matrix A using the scaling factors in the vector s.

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is stored.If uplo = 'U': upper triangular.If uplo = 'L': lower triangular.


n ≥ 0.

INTEGER. The number of super-diagonals of the matrix A ifuplo = 'U', or the number of sub-diagonals if uplo ='L'.

kd

kd ≥ 0.

REAL for slaqsbabDOUBLE PRECISION for dlaqsbCOMPLEX for claqsbCOMPLEX*16 for zlaqsbArray, DIMENSION (ldab,n). On entry, the upper or lowertriangle of the symmetric band matrix A, stored in the firstkd+1 rows of the array. The j-th column of A is stored inthe j-th column of the array ab as follows:

1370


if uplo = 'U', ab(kd+1+i-j,j) = A(i,j) for

max(1,j-kd) ≤ i ≤ j;

if uplo = 'L', ab(1+i-j,j) = A(i,j) for j ≤ i ≤min(n,j+kd).


ldab ≥ kd+1.

REAL for slaqsb/claqsbsDOUBLE PRECISION for dlaqsb/zlaqsbArray, DIMENSION (n). The scale factors for A.

REAL for slaqsb/claqsbscondDOUBLE PRECISION for dlaqsb/zlaqsbRatio of the smallest s(i) to the largest s(i).

REAL for slaqsb/claqsbamaxDOUBLE PRECISION for dlaqsb/zlaqsbAbsolute value of largest matrix entry.

Output Parameters

On exit, if info = 0, the triangular factor U or L from theCholesky factorization A = U'*U or A = L*L' of the bandmatrix A, in the same storage format as A.

ab


Application Notes

The routine uses internal parameters thresh, large, and small, which have the followingmeaning. thresh is a threshold value used to decide if scaling should be based on the ratio ofthe scaling factors. If scond < thresh, scaling is done. large and small are threshold valuesused to decide if scaling should be done based on the absolute size of the largest matrix element.If amax > large or amax < small, scaling is done.

1371

?laqspScales a symmetric/Hermitian matrix in packedstorage, using scaling factors computed by ?ppequ.

Syntax

call slaqsp( uplo, n, ap, s, scond, amax, equed )

call dlaqsp( uplo, n, ap, s, scond, amax, equed )

call claqsp( uplo, n, ap, s, scond, amax, equed )

call zlaqsp( uplo, n, ap, s, scond, amax, equed )

Description

The routine ?laqsp equilibrates a symmetric matrix A using the scaling factors in the vectors.

Input Parameters



REAL for slaqspapDOUBLE PRECISION for dlaqspCOMPLEX for claqspCOMPLEX*16 for zlaqspArray, DIMENSION (n(n+1)/2).On entry, the upper or lower triangle of the symmetricmatrix A, packed columnwise in a linear array. The j-thcolumn of A is stored in the array ap as follows:


≤ j;if uplo = 'L', ap(i + (j-1)(2n-j)/2) = A(i,j) for

j≤i≤n.

1372


REAL for slaqsp/claqspsDOUBLE PRECISION for dlaqsp/zlaqspArray, DIMENSION (n). The scale factors for A.

REAL for slaqsp/claqspscondDOUBLE PRECISION for dlaqsp/zlaqspRatio of the smallest s(i) to the largest s(i).

REAL for slaqsp/claqspamaxDOUBLE PRECISION for dlaqsp/zlaqspAbsolute value of largest matrix entry.

Output Parameters

On exit, the equilibrated matrix: diag(s)*A*diag(s), inthe same storage format as A.

ap


Application Notes


1373


?laqsyScales a symmetric/Hermitian matrix, using scalingfactors computed by ?poequ.

Syntax

call slaqsy( uplo, n, a, lda, s, scond, amax, equed )

call dlaqsy( uplo, n, a, lda, s, scond, amax, equed )

call claqsy( uplo, n, a, lda, s, scond, amax, equed )

call zlaqsy( uplo, n, a, lda, s, scond, amax, equed )

Description

The routine equilibrates a symmetric matrix A using the scaling factors in the vector s.

Input Parameters



n ≥ 0.

REAL for slaqsyaDOUBLE PRECISION for dlaqsyCOMPLEX for claqsyCOMPLEX*16 for zlaqsyArray, DIMENSION (lda,n). On entry, the symmetric matrixA.If uplo = 'U', the leading n-by-n upper triangular part ofa contains the upper triangular part of the matrix A, andthe strictly lower triangular part of a is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofa contains the lower triangular part of the matrix A, and thestrictly upper triangular part of a is not referenced.


1374


lda ≥ max(n,1).

REAL for slaqsy/claqsysDOUBLE PRECISION for dlaqsy/zlaqsyArray, DIMENSION (n). The scale factors for A.

REAL for slaqsy/claqsyscondDOUBLE PRECISION for dlaqsy/zlaqsyRatio of the smallest s(i) to the largest s(i).

REAL for slaqsy/claqsyamaxDOUBLE PRECISION for dlaqsy/zlaqsyAbsolute value of largest matrix entry.

Output Parameters

On exit, if equed = 'Y', the equilibrated matrix:diag(s)*A*diag(s).

a

CHARACTER*1.equedSpecifies whether or not equilibration was done.If equed = 'N': No equilibration.If equed = 'Y': Equilibration was done, i.e., A has beenreplaced by diag(s)*A*diag(s).

Application Notes


?laqtrSolves a real quasi-triangular system of equations,or a complex quasi-triangular system of specialform, in real arithmetic.

Syntax

call slaqtr( ltran, lreal, n, t, ldt, b, w, scale, x, work, info )

call dlaqtr( ltran, lreal, n, t, ldt, b, w, scale, x, work, info )

1375


Description

The routine ?laqtr solves the real quasi-triangular system

op(T) * p = scale*c, if lreal = .TRUE.

or the complex quasi-triangular systems

op(T + iB)*(p+iq) = scale*(c+id), if lreal = .FALSE.

in real arithmetic, where T is upper quasi-triangular.

If lreal = .FALSE., then the first diagonal block of T must be 1-by-1, B is the speciallystructured matrix

op(A) = A or A', A' denotes the conjugate transpose of matrix A.

On input,

This routine is designed for the condition number estimation in routine ?trsna.

Input Parameters

LOGICAL.ltranOn entry, ltran specifies the option of conjugate transpose:= .FALSE., op(T + iB) = T + iB,

1376


= .TRUE., op(T + iB) = (T + iB)'.

LOGICAL.lrealOn entry, lreal specifies the input matrix structure:= .FALSE., the input is complex= .TRUE., the input is real.

INTEGER.n

On entry, n specifies the order of T + iB. n ≥ 0.

REAL for slaqtrtDOUBLE PRECISION for dlaqtrArray, dimension (ldt,n). On entry, t contains a matrix inSchur canonical form. If lreal = .FALSE., then the firstdiagonal block of t must be 1-by-1.

INTEGER. The leading dimension of the matrix T.ldt

ldt ≥ max(1,n).

REAL for slaqtrbDOUBLE PRECISION for dlaqtrArray, dimension (n). On entry, b contains the elements toform the matrix B as described above. If lreal = .TRUE.,b is not referenced.

REAL for slaqtrwDOUBLE PRECISION for dlaqtrOn entry, w is the diagonal element of the matrix B.If lreal = .TRUE., w is not referenced.

REAL for slaqtrxDOUBLE PRECISION for dlaqtrArray, dimension (2n). On entry, x contains the right handside of the system.

REAL for slaqtrworkDOUBLE PRECISION for dlaqtrWorkspace array, dimension (n).

Output Parameters

REAL for slaqtrscaleDOUBLE PRECISION for dlaqtrOn exit, scale is the scale factor.

1377


On exit, X is overwritten by the solution.x

INTEGER.infoIf info = 0: successful exit.If info = 1: the some diagonal 1-by-1 block has beenperturbed by a small number smin to keep nonsingularity.If info = 2: the some diagonal 2-by-2 block has beenperturbed by a small number in ?laln2 to keepnonsingularity.


?lar1vComputes the (scaled) r-th column of the inverseof the submatrix in rows b1 through bn oftridiagonal matrix.

Syntax

call slar1v( n, b1, bn, lambda, d, l, ld, lld, pivmin, gaptol, z, wantnc,negcnt, ztz, mingma, r, isuppz, nrminv, resid, rqcorr, work )

call dlar1v( n, b1, bn, lambda, d, l, ld, lld, pivmin, gaptol, z, wantnc,negcnt, ztz, mingma, r, isuppz, nrminv, resid, rqcorr, work )

call clar1v( n, b1, bn, lambda, d, l, ld, lld, pivmin, gaptol, z, wantnc,negcnt, ztz, mingma, r, isuppz, nrminv, resid, rqcorr, work )

call zlar1v( n, b1, bn, lambda, d, l, ld, lld, pivmin, gaptol, z, wantnc,negcnt, ztz, mingma, r, isuppz, nrminv, resid, rqcorr, work )

Description

The routine ?lar1v computes the (scaled) r-th column of the inverse of the submatrix in rows

b1 through bn of the tridiagonal matrix L*D*LT - λ*I. When λ is close to an eigenvalue, thecomputed vector is an accurate eigenvector. Usually, r corresponds to the index where theeigenvector is largest in magnitude.

The following steps accomplish this computation :

1378


• Stationary qd transform, L*D*LT - λ*I = L(+)*D(+)*L(+)T

• Progressive qd transform, L*D*LT - λ*I = U(-)*D(-)*U(-)T,

• Computation of the diagonal elements of the inverse of L*D*LT - λ*I by combining theabove transforms, and choosing r as the index where the diagonal of the inverse is (one ofthe) largest in magnitude.

• Computation of the (scaled) r-th column of the inverse using the twisted factorizationobtained by combining the top part of the stationary and the bottom part of the progressivetransform.

Input Parameters

INTEGER. The order of the matrix L*D*LT.n

INTEGER. First index of the submatrix of L*D*LT.b1

INTEGER. Last index of the submatrix of L*D*LT.bn

REAL for slar1v/clar1vlambdaDOUBLE PRECISION for dlar1v/zlar1vThe shift. To compute an accurate eigenvector, lambdashould be a good approximation to an eigenvalue of L*D*LT.

REAL for slar1v/clar1vlDOUBLE PRECISION for dlar1v/zlar1vArray, DIMENSION (n-1).The (n-1) subdiagonal elements of the unit bidiagonal matrixL, in elements 1 to n-1.

REAL for slar1v/clar1vdDOUBLE PRECISION for dlar1v/zlar1vArray, DIMENSION (n).The n diagonal elements of the diagonal matrix D.

REAL for slar1v/clar1vldDOUBLE PRECISION for dlar1v/zlar1vArray, DIMENSION (n-1).The n-1 elements Li*Di.

REAL for slar1v/clar1vlldDOUBLE PRECISION for dlar1v/zlar1vArray, DIMENSION (n-1).The n-1 elements Li*Li*Di.

1379


REAL for slar1v/clar1vpivminDOUBLE PRECISION for dlar1v/zlar1vThe minimum pivot in the Sturm sequence.

REAL for slar1v/clar1vgaptolDOUBLE PRECISION for dlar1v/zlar1vTolerance that indicates when eigenvector entries arenegligible with respect to their contribution to the residual.

REAL for slar1vzDOUBLE PRECISION for dlar1vCOMPLEX for clar1vCOMPLEX*16 for zlar1vArray, DIMENSION (n). All entries of z must be set to 0.

LOGICAL.wantncSpecifies whether negcnt has to be computed.

INTEGER.rThe twist index for the twisted factorization used to compute

z. On input, 0 ≤ r ≤ n. If r is input as 0, r is set to the indexwhere (L*D*LT - lambda*I)-1 is largest in magnitude. If

1 ≤ r ≤ n, r is unchanged.

REAL for slar1v/clar1vworkDOUBLE PRECISION for dlar1v/zlar1vWorkspace array, DIMENSION (4*n).

Output Parameters

REAL for slar1vzDOUBLE PRECISION for dlar1vCOMPLEX for clar1vCOMPLEX*16 for zlar1vArray, DIMENSION (n). The (scaled) r-th column of theinverse. z(r) is returned to be 1.

INTEGER. If wantnc is .TRUE. then negcnt = the numberof pivots < pivmin in the matrix factorization L*D*LT, andnegcnt = -1 otherwise.

negcnt

REAL for slar1v/clar1vztzDOUBLE PRECISION for dlar1v/zlar1vThe square of the 2-norm of z.

1380

REAL for slar1v/clar1vmingmaDOUBLE PRECISION for dlar1v/zlar1vThe reciprocal of the largest (in magnitude) diagonal elementof the inverse of L*D*LT - lambda*I.

On output, r is the twist index used to compute z. Ideally,r designates the position of the maximum entry in theeigenvector.

r

INTEGER. Array, DIMENSION (2). The support of the vectorin Z, that is, the vector z is nonzero only in elementsisuppz(1) through isuppz(2).

isuppz

REAL for slar1v/clar1vnrminvDOUBLE PRECISION for dlar1v/zlar1vEquals 1/sqrt( ztz ).

REAL for slar1v/clar1vresidDOUBLE PRECISION for dlar1v/zlar1vThe residual of the FP vector.resid = ABS( mingma )/sqrt( ztz ).

REAL for slar1v/clar1vrqcorrDOUBLE PRECISION for dlar1v/zlar1vThe Rayleigh Quotient correction to lambda.rqcorr = mingma/ztz.

?lar2vApplies a vector of plane rotations with real cosinesand real/complex sines from both sides to asequence of 2-by-2 symmetric/Hermitian matrices.

Syntax

call slar2v( n, x, y, z, incx, c, s, incc )

call dlar2v( n, x, y, z, incx, c, s, incc )

call clar2v( n, x, y, z, incx, c, s, incc )

call zlar2v( n, x, y, z, incx, c, s, incc )

1381


Description

The routine ?lar2v applies a vector of real/complex plane rotations with real cosines from bothsides to a sequence of 2-by-2 real symmetric or complex Hermitian matrices, defined by theelements of the vectors x, y and z. For i = 1,2,...,n

Input Parameters

INTEGER. The number of plane rotations to be applied.n

REAL for slar2vx, y, zDOUBLE PRECISION for dlar2vCOMPLEX for clar2vCOMPLEX*16 for zlar2vArrays, DIMENSION (1+(n-1)*incx) each. Contain thevectors x, y and z, respectively. For all flavors of ?lar2v,elements of x and y are assumed to be real.

INTEGER. The increment between elements of x, y, and z.incx > 0.

incx

REAL for slar2v/clar2vcDOUBLE PRECISION for dlar2v/zlar2vArray, DIMENSION (1+(n-1)*incc). The cosines of the planerotations.

REAL for slar2vsDOUBLE PRECISION for dlar2vCOMPLEX for clar2vCOMPLEX*16 for zlar2vArray, DIMENSION (1+(n-1)*incc). The sines of the planerotations.

INTEGER. The increment between elements of c and s. incc> 0.

incc

1382


Output Parameters

Vectors x, y and z, containing the results of transform.x, y, z

?larfApplies an elementary reflector to a generalrectangular matrix.

Syntax

call slarf( side, m, n, v, incv, tau, c, ldc, work )

call dlarf( side, m, n, v, incv, tau, c, ldc, work )

call clarf( side, m, n, v, incv, tau, c, ldc, work )

call zlarf( side, m, n, v, incv, tau, c, ldc, work )

Description

The routine applies a real/complex elementary reflector H to a real/complex m-by-n matrix C,from either the left or the right. H is represented in the form

H = I - tau*v*v',

where tau is a real/complex scalar and v is a real/complex vector.

If tau = 0, then H is taken to be the unit matrix. For clarf/zlarf, to apply H' (the conjugatetranspose of H), supply conjg(tau) instead of tau.

Input Parameters

CHARACTER*1.sideIf side = 'L': form H*CIf side = 'R': form C*H.

INTEGER. The number of rows of the matrix C.m

INTEGER. The number of columns of the matrix C.n

REAL for slarfvDOUBLE PRECISION for dlarfCOMPLEX for clarfCOMPLEX*16 for zlarfArray, DIMENSION

1383


(1 + (m-1)*abs(incv)) if side = 'L' or(1 + (n-1)*abs(incv)) if side = 'R'. The vector v inthe representation of H. v is not used if tau = 0.

INTEGER. The increment between elements of v.incv

incv ≠ 0.

REAL for slarftauDOUBLE PRECISION for dlarfCOMPLEX for clarfCOMPLEX*16 for zlarfThe value tau in the representation of H.

REAL for slarfcDOUBLE PRECISION for dlarfCOMPLEX for clarfCOMPLEX*16 for zlarfArray, DIMENSION (ldc,n).On entry, the m-by-n matrix C.

INTEGER. The leading dimension of the array c.ldc

ldc ≥ max(1,m).

REAL for slarfworkDOUBLE PRECISION for dlarfCOMPLEX for clarfCOMPLEX*16 for zlarfWorkspace array, DIMENSION(n) if side = 'L' or(m) if side = 'R'.

Output Parameters

On exit, C is overwritten by the matrix H*C if side = 'L',or C*H if side = 'R'.

c

1384


?larfbApplies a block reflector or itstranspose/conjugate-transpose to a generalrectangular matrix.

Syntax

call slarfb( side, trans, direct, storev, m, n, k, v, ldv, t, ldt, c, ldc,work, ldwork )

call dlarfb( side, trans, direct, storev, m, n, k, v, ldv, t, ldt, c, ldc,work, ldwork )

call clarfb( side, trans, direct, storev, m, n, k, v, ldv, t, ldt, c, ldc,work, ldwork )

call zlarfb( side, trans, direct, storev, m, n, k, v, ldv, t, ldt, c, ldc,work, ldwork )

Description

The routine ?larfb applies a complex block reflector H or its transpose H' to a complex m-by-nmatrix C from either left or right.

Input Parameters

CHARACTER*1.sideIf side = 'L': apply H or H' from the leftIf side = 'R': apply H or H' from the right

CHARACTER*1.transIf trans = 'N': apply H (No transpose)If trans = 'C': apply H' (Conjugate transpose)

CHARACTER*1.directIndicates how H is formed from a product of elementaryreflectorsIf direct = 'F': H = H(1) H(2) . . . H(k) (forward)If direct = 'B': H = H(k) . . . H(2) H(1) (backward)

CHARACTER*1.storevIndicates how the vectors which define the elementaryreflectors are stored:If storev = 'C': Column-wise

1385


If storev = 'R': Row-wise



INTEGER. The order of the matrix T (equal to the numberof elementary reflectors whose product defines the blockreflector).

k

REAL for slarfbvDOUBLE PRECISION for dlarfbCOMPLEX for clarfbCOMPLEX*16 for zlarfbArray, DIMENSION(ldv, k) if storev = 'C'(ldv, m) if storev = 'R' and side = 'L'(ldv, n) if storev = 'R' and side = 'R'The matrix v.

INTEGER. The leading dimension of the array v.ldv

If storev = 'C' and side = 'L', ldv ≥ max(1,m);

if storev = 'C' and side = 'R', ldv ≥ max(1,n);

if storev = 'R', ldv ≥ k.

REAL for slarfbtDOUBLE PRECISION for dlarfbCOMPLEX for clarfbCOMPLEX*16 for zlarfbArray, DIMENSION (ldt,k).Contains the triangular k-by-k matrix T in the representationof the block reflector.

INTEGER. The leading dimension of the array t.LDT

ldt ≥ k.

REAL for slarfbcDOUBLE PRECISION for dlarfbCOMPLEX for clarfbCOMPLEX*16 for zlarfbArray, DIMENSION (ldc,n).On entry, the m-by-n matrix C.


1386


ldc ≥ max(1,m).

REAL for slarfbworkDOUBLE PRECISION for dlarfbCOMPLEX for clarfbCOMPLEX*16 for zlarfbWorkspace array, DIMENSION (ldwork, k).

INTEGER. The leading dimension of the array work.ldwork

If side = 'L', ldwork ≥ max(1, n);

if side = 'R', ldwork ≥ max(1, m).

Output Parameters

On exit, c is overwritten by H*C, or H'*C, or C*H, or C*H'.c

?larfgGenerates an elementary reflector (Householdermatrix).

Syntax

call slarfg( n, alpha, x, incx, tau )

call dlarfg( n, alpha, x, incx, tau )

call clarfg( n, alpha, x, incx, tau )

call zlarfg( n, alpha, x, incx, tau )

Description

The routine ?larfg generates a real/complex elementary reflector H of order n, such that

where alpha and beta are scalars (with beta real for all flavors), and x is an (n-1)-elementreal/complex vector. H is represented in the form

1387


where tau is a real/complex scalar and v is a real/complex (n-1)-element vector. Note that forclarfg/zlarfg, H is not Hermitian.

If the elements of x are all zero (and, for complex flavors, alpha is real), then tau = 0 and His taken to be the unit matrix.

Otherwise, 1 ≤ tau ≤ 2 (for real flavors), or

1 ≤ Re(tau) ≤ 2 and abs(tau-1) ≤ 1 (for complex flavors).

Input Parameters

INTEGER. The order of the elementary reflector.n

REAL for slarfgalphaDOUBLE PRECISION for dlarfgCOMPLEX for clarfgCOMPLEX*16 for zlarfg On entry, the value alpha.

REAL for slarfgxDOUBLE PRECISION for dlarfgCOMPLEX for clarfgCOMPLEX*16 for zlarfgArray, DIMENSION (1+(n-2)*abs(incx)).On entry, the vector x.

INTEGER.incxThe increment between elements of x. incx > 0.

Output Parameters

On exit, it is overwritten with the value beta.alpha

On exit, it is overwritten with the vector v.x

REAL for slarfgtauDOUBLE PRECISION for dlarfgCOMPLEX for clarfg

1388


COMPLEX*16 for zlarfg The value tau.

?larftForms the triangular factor T of a block reflector H= I - V*T*VH.

Syntax

call slarft( direct, storev, n, k, v, ldv, tau, t, ldt )

call dlarft( direct, storev, n, k, v, ldv, tau, t, ldt )

call clarft( direct, storev, n, k, v, ldv, tau, t, ldt )

call zlarft( direct, storev, n, k, v, ldv, tau, t, ldt )

Description

The routine ?larft forms the triangular factor T of a real/complex block reflector H of ordern, which is defined as a product of k elementary reflectors.

If direct = 'F', H = H(1) H(2) . . . H(k) and T is upper triangular;

If direct = 'B', H = H(k) . . . H(2) H(1) and T is lower triangular.

If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-thcolumn of the array v, and H = I - V*T*V' .

If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-throw of the array v, and H = I - V'*T*V.

Input Parameters

CHARACTER*1.directSpecifies the order in which the elementary reflectors aremultiplied to form the block reflector:= 'F': H = H(1) H(2) . . . H(k) (forward)= 'B': H = H(k) . . . H(2) H(1) (backward)

CHARACTER*1.storevSpecifies how the vectors which define the elementaryreflectors are stored (see also Application Notes below):= 'C': column-wise= 'R': row-wise.

1389


INTEGER. The order of the block reflector H. n ≥ 0.n

INTEGER. The order of the triangular factor T (equal to the

number of elementary reflectors). k ≥ 1.

k

REAL for slarftvDOUBLE PRECISION for dlarftCOMPLEX for clarftCOMPLEX*16 for zlarftArray, DIMENSION(ldv, k) if storev = 'C' or(ldv, n) if storev = 'R'.The matrix V.


If storev = 'C', ldv ≥ max(1,n);


REAL for slarfttauDOUBLE PRECISION for dlarftCOMPLEX for clarftCOMPLEX*16 for zlarftArray, DIMENSION (k). tau(i) must contain the scalar factorof the elementary reflector H(i).

INTEGER. The leading dimension of the output array t. ldt

≥ k.

ldt

Output Parameters

REAL for slarfttDOUBLE PRECISION for dlarftCOMPLEX for clarftCOMPLEX*16 for zlarftArray, DIMENSION (ldt,k). The k-by-k triangular factor Tof the block reflector. If direct = 'F', T is uppertriangular; if direct = 'B', T is lower triangular. The restof the array is not used.

The matrix V.v

1390


Application Notes

The shape of the matrix V and the storage of the vectors which define the H(i) is best illustratedby the following example with n = 5 and k = 3. The elements equal to 1 are not stored; thecorresponding array elements are modified but restored on exit. The rest of the array is notused.

1391


?larfxApplies an elementary reflector to a generalrectangular matrix, with loop unrolling when thereflector has order ≥ 10.

Syntax

call slarfx( side, m, n, v, tau, c, ldc, work )

call dlarfx( side, m, n, v, tau, c, ldc, work )

call clarfx( side, m, n, v, tau, c, ldc, work )

call zlarfx( side, m, n, v, tau, c, ldc, work )

Description

The routine ?larfx applies a real/complex elementary reflector H to a real/complex m-by-nmatrix C, from either the left or the right.

H is represented in the form

H = I - tau*v*v', where tau is a real/complex scalar and v is a real/complex vector.

If tau = 0, then H is taken to be the unit matrix

Input Parameters

CHARACTER*1.sideIf side = 'L': form H*CIf side = 'R': form C*H.



REAL for slarfxvDOUBLE PRECISION for dlarfxCOMPLEX for clarfxCOMPLEX*16 for zlarfxArray, DIMENSION(m) if side = 'L' or(n) if side = 'R'.The vector v in the representation of H.

REAL for slarfxtau

1392


DOUBLE PRECISION for dlarfxCOMPLEX for clarfxCOMPLEX*16 for zlarfxThe value tau in the representation of H.

REAL for slarfxcDOUBLE PRECISION for dlarfxCOMPLEX for clarfxCOMPLEX*16 for zlarfxArray, DIMENSION (ldc,n). On entry, the m-by-n matrix C.

INTEGER. The leading dimension of the array c. lda ≥ (1,m).ldc

REAL for slarfxworkDOUBLE PRECISION for dlarfxCOMPLEX for clarfxCOMPLEX*16 for zlarfxWorkspace array, DIMENSION(n) if side = 'L' or(m) if side = 'R'.work is not referenced if H has order < 11.

Output Parameters


c

?largvGenerates a vector of plane rotations with realcosines and real/complex sines.

Syntax

call slargv( n, x, incx, y, incy, c, incc )

call dlargv( n, x, incx, y, incy, c, incc )

call clargv( n, x, incx, y, incy, c, incc )

call zlargv( n, x, incx, y, incy, c, incc )

1393

Description

The routine generates a vector of real/complex plane rotations with real cosines, determinedby elements of the real/complex vectors x and y.

For slargv/dlargv:

For clargv/zlargv:

where c(i)2 + abs(s(i))2 = 1 and the following conventions are used (these are the sameas in clartg/zlartg but differ from the BLAS Level 1 routine crotg/zrotg):

If yi = 0, then c(i) = 1 and s(i) = 0;

If xi = 0, then c(i) = 0 and s(i) is chosen so that ri is real.

Input Parameters

INTEGER. The number of plane rotations to be generated.n

REAL for slargvx, yDOUBLE PRECISION for dlargvCOMPLEX for clargvCOMPLEX*16 for zlargvArrays, DIMENSION (1+(n-1)*incx) and (1+(n-1)*incy),respectively. On entry, the vectors x and y.

INTEGER. The increment between elements of x.incxincx > 0.

INTEGER. The increment between elements of y.incyincy > 0.

1394


INTEGER. The increment between elements of the outputarray c. incc > 0.

incc

Output Parameters

On exit, x(i) is overwritten by ai (for real flavors), or by ri(for complex flavors), for i = 1,...,n.

x

On exit, the sines s(i) of the plane rotations.y

REAL for slargv/clargvcDOUBLE PRECISION for dlargv/zlargvArray, DIMENSION (1+(n-1)*incc). The cosines of the planerotations.

?larnvReturns a vector of random numbers from auniform or normal distribution.

Syntax

call slarnv( idist, iseed, n, x )

call dlarnv( idist, iseed, n, x )

call clarnv( idist, iseed, n, x )

call zlarnv( idist, iseed, n, x )

Description

The routine ?larnv returns a vector of n random real/complex numbers from a uniform ornormal distribution.

This routine calls the auxiliary routine ?laruv to generate random real numbers from a uniform(0,1) distribution, in batches of up to 128 using vectorisable code. The Box-Muller method isused to transform numbers from a uniform to a normal distribution.

Input Parameters

INTEGER. Specifies the distribution of the random numbers:for slarnv and dlanrv:

idist

= 1: uniform (0,1)= 2: uniform (-1,1)

1395


= 3: normal (0,1).for clarnv and zlanrv:= 1: real and imaginary parts each uniform (0,1)= 2: real and imaginary parts each uniform (-1,1)= 3: real and imaginary parts each normal (0,1)= 4: uniformly distributed on the disc abs(z) < 1= 5: uniformly distributed on the circle abs(z) = 1

INTEGER. Array, DIMENSION (4).iseedOn entry, the seed of the random number generator; thearray elements must be between 0 and 4095, and iseed(4)must be odd.

INTEGER. The number of random numbers to be generated.n

Output Parameters

REAL for slarnvxDOUBLE PRECISION for dlarnvCOMPLEX for clarnvCOMPLEX*16 for zlarnvArray, DIMENSION (n). The generated random numbers.

On exit, the seed is updated.iseed

?larraComputes the splitting points with the specifiedthreshold.

Syntax

call slarra( n, d, e, e2, spltol, tnrm, nsplit, isplit, info )

call dlarra( n, d, e, e2, spltol, tnrm, nsplit, isplit, info )

Description

This routine computes the splitting points with the specified threshold and sets any "small"off-diagonal elements to zero.

Input Parameters

INTEGER. The order of the matrix (n > 1).n

1396

REAL for slarradDOUBLE PRECISION for dlarraArray, DIMENSION (n).Contains n diagonal elements of the tridiagonal matrix T.

REAL for slarraeDOUBLE PRECISION for dlarraArray, DIMENSION (n).First (n-1) entries contain the subdiagonal elements of thetridiagonal matrix T; e(n) need not be set.

REAL for slarrae2DOUBLE PRECISION for dlarraArray, DIMENSION (n).First (n-1) entries contain the squares of the subdiagonalelements of the tridiagonal matrix T; e2(n) need not beset.

REAL for slarraspltolDOUBLE PRECISION for dlarraThe threshold for splitting. Two criteria can be used:spltol<0 : criterion based on absolute off-diagonal value;spltol>0 : criterion that preserves relative accuracy.

REAL for slarratnrmDOUBLE PRECISION for dlarraThe norm of the matrix.

Output Parameters

On exit, the entries e(isplit(i)), 1 ≤ i ≤ nsplit, areset to zero, the other entries of e are untouched.

e

On exit, the entries e2(isplit(i)), 1 ≤ i ≤ nsplit,are set to zero.

e2

INTEGER.nsplit

The number of blocks the matrix T splits into. 1 ≤ nsplit

≤ n

INTEGER.Array, DIMENSION (n).

isplit

1397

The splitting points, at which T breaks up into blocks. Thefirst block consists of rows/columns 1 to isplit(1), thesecond of rows/columns isplit(1)+1 through isplit(2),and so on, and the nsplit-th consists of rows/columnsisplit(nsplit-1)+1 through isplit(nsplit)=n.

INTEGER.info= 0: successful exit.

?larrbProvides limited bisection to locate eigenvalues formore accuracy.

Syntax

call slarrb( n, d, lld, ifirst, ilast, rtol1, rtol2, offset, w, wgap, werr,work, iwork, pivmin, spdiam, twist, info )

call dlarrb( n, d, lld, ifirst, ilast, rtol1, rtol2, offset, w, wgap, werr,work, iwork, pivmin, spdiam, twist, info )

Description

Given the relatively robust representation (RRR) L*D*LT, the routine does “limited” bisectionto refine the eigenvalues of L*D*LT, w( ifirst-offset ) through w( ilast-offset ), to moreaccuracy. Initial guesses for these eigenvalues are input in w. The corresponding estimate ofthe error in these guesses and their gaps are input in werr and wgap, respectively. Duringbisection, intervals [left, right] are maintained by storing their mid-points and semi-widthsin the arrays w and werr respectively.

Input Parameters

INTEGER. The order of the matrix.n

REAL for slarrbdDOUBLE PRECISION for dlarrbArray, DIMENSION (n). The n diagonal elements of thediagonal matrix D.

REAL for slarrblldDOUBLE PRECISION for dlarrbArray, DIMENSION (n-1).

1398


The n-1 elements Li*Li*Di.

INTEGER. The index of the first eigenvalue to be computed.ifirst

INTEGER. The index of the last eigenvalue to be computed.ilast

REAL for slarrbrtol1, rtol2DOUBLE PRECISION for dlarrbTolerance for the convergence of the bisection intervals. Aninterval [left, right] has converged ifRIGHT-LEFT.LT.MAX( rtol1*gap,rtol2*max(|left|,|right|) ), where gap is the(estimated) distance to the nearest eigenvalue.

INTEGER. Offset for the arrays w, wgap and werr, that is,the ifirst-offset through ilast-offset elements of thesearrays are to be used.

offset

REAL for slarrbwDOUBLE PRECISION for dlarrbArray, DIMENSION (n). On input, w( ifirst-offset ) throughw( ilast-offset ) are estimates of the eigenvalues ofL*D*LT indexed ifirst through ilast.

REAL for slarrbwgapDOUBLE PRECISION for dlarrbArray, DIMENSION (n-1). The estimated gaps betweenconsecutive eigenvalues of L*D*LT, that is, wgap(i-offset)is the gap between eigenvalues i and i+1. Note that ifIFIRST.EQ.ILAST then wgap(ifirst-offset) must be setto 0.

REAL for slarrbwerrDOUBLE PRECISION for dlarrbArray, DIMENSION (n). On input, werr(ifirst-offset)through werr(ilast-offset) are the errors in the estimatesof the corresponding elements in w.

REAL for slarrbworkDOUBLE PRECISION for dlarrbWorkspace array, DIMENSION (2*n).

REAL for slarrbpivminDOUBLE PRECISION for dlarrbThe minimum pivot in the Sturm sequence.

1399


REAL for slarrbspdiamDOUBLE PRECISION for dlarrbThe spectral diameter of the matrix.

INTEGER. The twist index for the twisted factorization thatis used for the negcount.

twist

twist = n: Compute negcount from L*D*LT - lambda*i= L+ * D+ * L+T

twist = n: Compute negcount from L*D*LT - lambda*i= U- * D- * U-T

twist = n: Compute negcount from L*D*LT - lambda*i= Nr*D r*Nr


Output Parameters

On output, the estimates of the eigenvalues are“refined”.w

On output, the gaps are refined.wgap

On output, “refined” errors in the estimates of w.werr

INTEGER.infoError flag.

?larrcComputes the number of eigenvalues of thesymmetric tridiagonal matrix.

Syntax

call slarrc( jobt, n, vl, vu, d, e, pivmin, eigcnt, lcnt, rcnt, info )

call dlarrc( jobt, n, vl, vu, d, e, pivmin, eigcnt, lcnt, rcnt, info )

Description

This routine finds the number of eigenvalues of the symmetric tridiagonal matrix T or of itsfactorization L*D*L**T in the specified interval.

1400


Input Parameters

CHARACTER*1.jobt= 'T': computes Sturm count for matrix T.= 'L': computes Sturm count for matrix L*D*L**T.

INTEGER.nThe order of the matrix. (n > 1).

REAL for slarrcvl,vuDOUBLE PRECISION for dlarrcThe lower and upper bounds for the eigenvalues.REAL for slarrcdDOUBLE PRECISION for dlarrcArray, DIMENSION (n).If jobt= 'T': contains the n diagonal elements of thetridiagonal matrix T.If jobt= 'L': contains the n diagonal elements of thediagonal matrix D.

REAL for slarrceDOUBLE PRECISION for dlarrcArray, DIMENSION (n).If jobt= 'T': contains the (n-1)offdiagonal elements ofthe matrix T.If jobt= 'L': contains the (n-1)offdiagonal elements ofthe matrix L.

REAL for slarrcpivminDOUBLE PRECISION for dlarrcThe minimum pivot in the Sturm sequence for the matrixT.

Output Parameters

INTEGER.The number of eigenvalues of the symmetric tridiagonalmatrix T that are in the half-open interval (vl,vu].

eigcnt

INTEGER.lcnt,rcntThe left and right negcounts of the interval.

INTEGER.info

1401


Now it is not used and always is set to 0.

?larrdComputes the eigenvalues of a symmetrictridiagonal matrix to suitable accuracy.

Syntax

call slarrd( range, order, n, vl, vu, il, iu, gers, reltol, d, e, e2, pivmin,nsplit, isplit, m, w, werr, wl, wu, iblock, indexw, work, iwork, info )

call dlarrd( range, order, n, vl, vu, il, iu, gers, reltol, d, e, e2, pivmin,nsplit, isplit, m, w, werr, wl, wu, iblock, indexw, work, iwork, info )

Description

The routine computes the eigenvalues of a symmetric tridiagonal matrix T to suitable accuracy.This is an auxiliary code to be called from ?stemr. The user may ask for all eigenvalues, alleigenvalues in the half-open interval (vl, vu], or the il-th through iu-th eigenvalues.

To avoid overflow, the matrix must be scaled so that its largest element is no greater than(overflow1/2*underflow1/4) in absolute value, and for greatest accuracy, it should not bemuch smaller than that. (For more details see [Kahan66].

Input Parameters

CHARACTER.range= 'A': ("All") all eigenvalues will be found.= 'V': ("Value") all eigenvalues in the half-open interval(vl, vu] will be found.= 'I': ("Index") the il-th through iu-th eigenvalues willbe found.

CHARACTER.order= 'B': ("By block") the eigenvalues will be grouped bysplit-off block (see iblock, isplit below) and orderedfrom smallest to largest within the block.= 'E': ("Entire matrix") the eigenvalues for the entirematrix will be ordered from smallest to largest.

INTEGER. The order of the tridiagonal matrix T (n ≥ 1).n

1402


REAL for slarrdvl,vuDOUBLE PRECISION for dlarrdIf range = 'V': the lower and upper bounds of theinterval to be searched for eigenvalues. Eigenvalues lessthan or equal to vl, or greater than vu, will not be returned.vl < vu.If range = 'A' or 'I': not referenced.

INTEGER.il,iuIf range = 'I': the indices (in ascending order) of the

smallest and largest eigenvalues to be returned. 1 ≤ il ≤

iu ≤ n, if n > 0; il=1 and iu=0 if n=0.If range = 'A' or 'V': not referenced.

REAL for slarrdgersDOUBLE PRECISION for dlarrdArray, DIMENSION (2*n).The n Gerschgorin intervals (the i-th Gerschgorininterval is (gers(2*i-1), gers(2*i)).

REAL for slarrdreltolDOUBLE PRECISION for dlarrdThe minimum relative width of an interval. When an intervalis narrower than reltol times the larger (in magnitude)endpoint, then it is considered to be sufficiently small, thatis converged. Note: this should always be at leastradix*machine epsilon.

REAL for slarrddDOUBLE PRECISION for dlarrdArray, DIMENSION (n).Contains n diagonal elements of the tridiagonal matrix T.

REAL for slarrdeDOUBLE PRECISION for dlarrdArray, DIMENSION (n-1).Contains (n-1) off-diagonal elements of the tridiagonalmatrix T.

REAL for slarrde2DOUBLE PRECISION for dlarrdArray, DIMENSION (n-1).

1403

Contains (n-1) squared off-diagonal elements of thetridiagonal matrix T.

REAL for slarrdpivminDOUBLE PRECISION for dlarrdThe minimum pivot in the Sturm sequence for the matrixT.

INTEGER.nsplit

The number of diagonal blocks the matrix T . 1 ≤ nsplit

≤ n

INTEGER.isplitArrays, DIMENSION (n).The splitting points, at which T breaks up into submatrices.The first submatrix consists of rows/columns 1 toisplit(1), the second of rows/columns isplit(1)+1through isplit(2), and so on, and the nsplit-th consistsof rows/columns isplit(nsplit-1)+1 throughisplit(nsplit)=n.(Only the first nsplit elements actually is used, but sincethe user cannot know a priori value of nsplit, n wordsmust be reserved for isplit.)

REAL for slarrdworkDOUBLE PRECISION for dlarrdWorkspace array, DIMENSION (4*n).


Output Parameters

INTEGER.m

The actual number of eigenvalues found. 0 ≤ m ≤ n. (Seealso the description of info=2,3.)

REAL for slarrdwDOUBLE PRECISION for dlarrdArray, DIMENSION (n).

1404


The first m elements of w contain the eigenvalueapproximations. ?laprd computes an interval Ij = (aj,bj] that includes eigenvalue j. The eigenvalueapproximation is given as the interval midpoint w(j)=(aj+bj)/2. The corresponding error is bounded by werr(j)= abs(aj-bj)/2.

REAL for slarrdwerrDOUBLE PRECISION for dlarrdArray, DIMENSION (n).The error bound on the corresponding eigenvalueapproximation in w.

REAL for slarrdwl, wuDOUBLE PRECISION for dlarrdThe interval (wl, wu] contains all the wanted eigenvalues.If range = 'V': then wl=vl and wu=vu.If range = 'A': then wl and wu are the global Gerschgorinbounds on the spectrum.If range = 'I': then wl and wu are computed by ?laebzfrom the index range specified.

INTEGER.iblockArray, DIMENSION (n).At each row/column j where e(j) is zero or small, thematrix T is considered to split into a block diagonal matrix.If info = 0, then iblock(i) specifies to which block (from1 to the number of blocks) the eigenvalue w(i) belongs.(The routine may use the remaining n-m elements asworkspace.)

INTEGER.indexwArray, DIMENSION (n).The indices of the eigenvalues within each block (submatrix);for example, indexw(i)= j and iblock(i)=k imply thatthe i-th eigenvalue w(i) is the j-th eigenvalue in block k.

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument has an illegal value> 0: some or all of the eigenvalues fail to converge or arenot computed:

1405

=1 or 3: bisection fail to converge for some eigenvalues;these eigenvalues are flagged by a negative block number.The effect is that the eigenvalues may not be as accurateas the absolute and relative tolerances.=2 or 3: range='I' only: not all of the eigenvalues il:iuare found.=4: range='I', and the Gershgorin interval initially usedis too small. No eigenvalues are computed.

?larreGiven the tridiagonal matrix T, sets smalloff-diagonal elements to zero and for eachunreduced block Ti, finds base representations andeigenvalues.

Syntax

call slarre( range, n, vl, vu, il, iu, d, e, e2, rtol1, rtol2, spltol, nsplit,isplit, m, w, werr, wgap, iblock, indexw, gers, pivmin, work, iwork, info )

call dlarre( range, n, vl, vu, il, iu, d, e, e2, rtol1, rtol2, spltol, nsplit,isplit, m, w, werr, wgap, iblock, indexw, gers, pivmin, work, iwork, info )

Description

To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, the routine setsany “small” off-diagonal elements to zero, and for each unreduced block Ti, it finds

• a suitable shift at one end of the block spectrum

• the base representation, Ti - σi *I = Li*Di*LiT, and

• eigenvalues of each Li*Di*LiT.

The representations and eigenvalues found are then used by ?stemr to compute the eigenvectorsof a symmetric tridiagonal matrix. The accuracy varies depending on whether bisection is usedto find a few eigenvalues or the dqds algorithm (subroutine ?lasq2) to compute all and discardany unwanted one. As an added benefit, ?larre also outputs the n Gerschgorin intervals forthe matrices Li*Di*Li

T.

1406


Input Parameters

CHARACTER.range= 'A': ("All") all eigenvalues will be found.= 'V': ("Value") all eigenvalues in the half-open interval(vl, vu] will be found.= 'I': ("Index") the il-th through iu-th eigenvalues ofthe entire matrix will be found.

INTEGER. The order of the matrix. n > 0.n

REAL for slarrevl, vuDOUBLE PRECISION for dlarreIf range='V', the lower and upper bounds for theeigenvalues. Eigenvalues less than or equal to vl, or greaterthan vu, are not returned. vl < vu.

INTEGER.il, iuIf range='I', the indices (in ascending order) of the

smallest and largest eigenvalues to be returned. 1 ≤ il ≤

iu ≤ n.

REAL for slarredDOUBLE PRECISION for dlarreArray, DIMENSION (n).The n diagonal elements of the diagonal matrices T.

REAL for slarreeDOUBLE PRECISION for dlarreArray, DIMENSION (n). The first (n-1) entries contain thesubdiagonal elements of the tridiagonal matrix T; e(n) neednot be set.

REAL for slarree2DOUBLE PRECISION for dlarreArray, DIMENSION (n). The first (n-1) entries contain thesquares of the subdiagonal elements of the tridiagonalmatrix T; e2(n) need not be set.

REAL for slarrertol1, rtol2DOUBLE PRECISION for dlarre

1407

Parameters for bisection. An interval [LEFT,RIGHT] hasconverged if RIGHT-LEFT.LT.MAX( rtol1*gap,rtol2*max(|LEFT|,|RIGHT|) ).

REAL for slarrespltolDOUBLE PRECISION for dlarreThe threshold for splitting.

REAL for slarreworkDOUBLE PRECISION for dlarreWorkspace array, DIMENSION (6*n).


Output Parameters

On exit, if range='I' or ='A', contain the bounds on thedesired part of the spectrum.

vl, vu

On exit, the n diagonal elements of the diagonal matricesDi .

d

On exit, the subdiagonal elements of the unit bidiagonal

matrices Li . The entries e( isplit( i) ), 1 ≤ i ≤ nsplit,contain the base points sigmai on output.

e

On exit, the entries e2( isplit( i) ), 1 ≤ i ≤ nsplit, havebeen set to zero.

e2

INTEGER. The number of blocks T splits into. 1 ≤ nsplit

≤ n.

nsplit

INTEGER. Array, DIMENSION (n). The splitting points, atwhich T breaks up into blocks. The first block consists ofrows/columns 1 to isplit(1), the second of rows/columns

isplit

isplit(1)+1 through isplit(2), etc., and the nsplit-thconsists of rows/columns isplit(nsplit-1)+1 throughisplit(nsplit)=n.

INTEGER. The total number of eigenvalues (of all theLi*Di*Li

T) found.m

REAL for slarrewDOUBLE PRECISION for dlarre

1408


Array, DIMENSION (n). The first m elements contain theeigenvalues. The eigenvalues of each of the blocks,Li*Di*Li

T, are sorted in ascending order. The routine mayuse the remaining n-m elements as workspace.

REAL for slarrewerrDOUBLE PRECISION for dlarreArray, DIMENSION (n). The error bound on the correspondingeigenvalue in w.

REAL for slarrewgapDOUBLE PRECISION for dlarreArray, DIMENSION (n). The separation from the rightneighbor eigenvalue in w. The gap is only with respect tothe eigenvalues of the same block as each block has its ownrepresentation tree. Exception: at the right end of a blockthe left gap is stored.

INTEGER. Array, DIMENSION (n).iblockThe indices of the blocks (submatrices) associated with thecorresponding eigenvalues in w; iblock(i)=1 if eigenvaluew(i) belongs to the first block from the top, =2 if w(i)belongs to the second block, etc.

INTEGER. Array, DIMENSION (n).indexwThe indices of the eigenvalues within each block (submatrix);for example, indexw(i)= 10 and iblock(i)=2 imply thatthe i-th eigenvalue w(i) is the 10-th eigenvalue in thesecond block.

REAL for slarregersDOUBLE PRECISION for dlarreArray, DIMENSION (2*n). The n Gerschgorin intervals (thei-th Gerschgorin interval is (gers(2*i-1), gers(2*i)).

REAL for slarrepivminDOUBLE PRECISION for dlarreThe minimum pivot in the Sturm sequence for T .

INTEGER.infoIf info = 0: successful exitIf info > 0: A problem occured in ?larre. If info = 5,the Rayleigh Quotient Iteration failed to converge to fullaccuracy.

1409


If info < 0: One of the called subroutines signaled aninternal problem. Inspection of the corresponding parameterinfo for further information is required.

• If info = -1, there is a problem in ?larrd

• If info = -2, no base representation could be found inmaxtry iterations. Increasing maxtry and recompilationmight be a remedy.

• If info = -3, there is a problem in ?larrb whencomputing the refined root representation for ?lasq2.

• If info = -4, there is a problem in ?larrb whenpreforming bisection on the desired part of the spectrum.

• If info = -5, there is a problem in ?lasq2.

• If info = -6, there is a problem in ?lasq2.

?larrfFinds a new relatively robust representation suchthat at least one of the eigenvalues is relativelyisolated.

Syntax

call slarrf( n, d, l, ld, clstrt, clend, w, wgap, werr, spdiam, clgapl, clgapr,pivmin, sigma, dplus, lplus, work, info )

call dlarrf( n, d, l, ld, clstrt, clend, w, wgap, werr, spdiam, clgapl, clgapr,pivmin, sigma, dplus, lplus, work, info )

Description

Given the initial representation L*D*LT and its cluster of close eigenvalues (in a relative measure),w(clstrt), w(clstrt+1), ... w(clend), the routine ?larrf finds a new relatively robustrepresentation

L*D*LT - σi*I = L(+)*D(+)*L(+)T

such that at least one of the eigenvalues of L(+)*D*(+)*L(+)T is relatively isolated.

1410

Input Parameters

INTEGER. The order of the matrix (subblock, if the matrixis splitted).

n

REAL for slarrfdDOUBLE PRECISION for dlarrfArray, DIMENSION (n). The n diagonal elements of thediagonal matrix D.

REAL for slarrflDOUBLE PRECISION for dlarrfArray, DIMENSION (n-1).The (n-1) subdiagonal elements of the unit bidiagonal matrixL.

REAL for slarrfldDOUBLE PRECISION for dlarrfArray, DIMENSION (n-1).The n-1 elements Li*Di.

INTEGER. The index of the first eigenvalue in the cluster.clstrt

INTEGER. The index of the last eigenvalue in the cluster.clend

REAL for slarrfwDOUBLE PRECISION for dlarrf

Array, DIMENSION ≥ (clend -clstrt+1). The eigenvalueapproximations of L*D*LT in ascending order. w(clstrt)through w(clend) form the cluster of relatively closeeigenvalues.

REAL for slarrfwgapDOUBLE PRECISION for dlarrf

Array, DIMENSION ≥ (clend -clstrt+1). The separationfrom the right neighbor eigenvalue in w.

REAL for slarrfwerrDOUBLE PRECISION for dlarrf

Array, DIMENSION ≥ (clend -clstrt+1). On input, werrcontains the semiwidth of the uncertainty interval of thecorresponding eigenvalue approximation in w.

REAL for slarrfspdiamDOUBLE PRECISION for dlarrf

1411


Estimate of the spectral diameter obtained from theGerschgorin intervals.

REAL for slarrfclgapl, clgaprDOUBLE PRECISION for dlarrfAbsolute gap on each end of the cluster. Set by the callingroutine to protect against shifts too close to eigenvaluesoutside the cluster.

REAL for slarrfpivminDOUBLE PRECISION for dlarrfThe minimum pivot allowed in the Sturm sequence.

REAL for slarrfworkDOUBLE PRECISION for dlarrfWorkspace array, DIMENSION (2*n).

Output Parameters

On output, the gaps are refined.wgap

REAL for slarrfsigmaDOUBLE PRECISION for dlarrfThe shift used to form L(+)*D*(+)*L(+)T.

REAL for slarrfdplusDOUBLE PRECISION for dlarrfArray, DIMENSION (n). The n diagonal elements of thediagonal matrix D(+).

REAL for slarrflplusDOUBLE PRECISION for dlarrfArray, DIMENSION (n). The first (n-1) elements of lpluscontain the subdiagonal elements of the unit bidiagonalmatrix L(+).

1412


?larrjPerforms refinement of the initial estimates of theeigenvalues of the matrix T.

Syntax

call slarrj( n, d, e2, ifirst, ilast, rtol, offset, w, werr, work, iwork,pivmin, spdiam, info )

call dlarrj( n, d, e2, ifirst, ilast, rtol, offset, w, werr, work, iwork,pivmin, spdiam, info )

Description

Given the initial eigenvalue approximations of T, this routine does bisection to refine theeigenvalues of T, w(ifirst-offset) through w(ilast-offset) , to more accuracy. Initialguesses for these eigenvalues are input in w, the corresponding estimate of the error in theseguesses in werr. During bisection, intervals [a,b] are maintained by storing their mid-pointsand semi-widths in the arrays w and werr respectively.

Input Parameters

INTEGER. The order of the matrix T.n

REAL for slarrjdDOUBLE PRECISION for dlarrjArray, DIMENSION (n).Contains n diagonal elements of the matrix T.

REAL for slarrje2DOUBLE PRECISION for dlarrjArray, DIMENSION (n-1).Contains (n-1) squared sub-diagonal elements of the T.

INTEGER.ifirstThe index of the first eigenvalue to be computed.

INTEGER.ilastThe index of the last eigenvalue to be computed.REAL for slarrjrtolDOUBLE PRECISION for dlarrj

1413


Tolerance for the convergence of the bisection intervals. An

interval [a,b] is considered to be converged if (b-a) ≤rtol*max(|a|,|b|).

INTEGER.offsetOffset for the arrays w and werr, that is theifirst-offset through ilast-offset elements ofthese arrays are to be used.REAL for slarrjwDOUBLE PRECISION for dlarrjArray, DIMENSION (n).On input, w(ifirst-offset) through w(ilast-offset)are estimates of the eigenvalues of L*D*L**T indexedifirst through ilast.

REAL for slarrjwerrDOUBLE PRECISION for dlarrjArray, DIMENSION (n).On input, werr(ifirst-offset) throughwerr(ilast-offset) are the errors in the estimates ofthe corresponding elements in w.

REAL for slarrjworkDOUBLE PRECISION for dlarrjWorkspace array, DIMENSION (2*n).


REAL for slarrjpivminDOUBLE PRECISION for dlarrjThe minimum pivot in the Sturm sequence for the matrixT.

REAL for slarrjspdiamDOUBLE PRECISION for dlarrjThe spectral diameter of the matrix T.

Output Parameters

On exit, contains the refined estimates of the eigenvalues.w

1414


On exit, contains the refined errors in the estimates of thecorresponding elements in w.

werr

INTEGER.infoNow it is not used and always is set to 0.

?larrkComputes one eigenvalue of a symmetrictridiagonal matrix T to suitable accuracy.

Syntax

call slarrk( n, iw, gl, gu, d, e2, pivmin, reltol, w, werr, info )

call dlarrk( n, iw, gl, gu, d, e2, pivmin, reltol, w, werr, info )

Description

The routine computes one eigenvalue of a symmetric tridiagonal matrix T to suitable accuracy.This is an auxiliary code to be called from ?stemr.

To avoid overflow, the matrix must be scaled so that its largest element is no greater than(overflow1/2*underflow1/4) in absolute value, and for greatest accuracy, it should not bemuch smaller than that. (For more details see [[Kahan66]].

Input Parameters

INTEGER. The order of the matrix T. (n ≥ 1).n

INTEGER.iwThe index of the eigenvalue to be returned.

REAL for slarrkgl, guDOUBLE PRECISION for dlarrkAn upper and a lower bound on the eigenvalue.REAL for slarrkdDOUBLE PRECISION for dlarrkArray, DIMENSION (n).Contains n diagonal elements of the matrix T.

REAL for slarrke2DOUBLE PRECISION for dlarrkArray, DIMENSION (n-1).

1415


Contains (n-1) squared off-diagonal elements of the T.

REAL for slarrkpivminDOUBLE PRECISION for dlarrkThe minimum pivot in the Sturm sequence for the matrixT.

REAL for slarrkreltolDOUBLE PRECISION for dlarrkThe minimum relative width of an interval. When an intervalis narrower than reltol times the larger (in magnitude)endpoint, then it is considered to be sufficiently small, thatis converged. Note: this should always be at leastradix*machine epsilon.

Output Parameters

REAL for slarrkwDOUBLE PRECISION for dlarrkContains the eigenvalue approximation.

REAL for slarrkwerrDOUBLE PRECISION for dlarrkContains the error bound on the corresponding eigenvalueapproximation in w.

INTEGER.info= 0: Eigenvalue converges= -1: Eigenvalue does not converge

?larrrPerforms tests to decide whether the symmetrictridiagonal matrix T warrants expensivecomputations which guarantee high relativeaccuracy in the eigenvalues.

Syntax

call slarrr( n, d, e, info )

call dlarrr( n, d, e, info )

1416


Description

The routine performs tests to decide whether the symmetric tridiagonal matrix T warrantsexpensive computations which guarantee high relative accuracy in the eigenvalues.

Input Parameters

INTEGER. The order of the matrix T. (n > 0).n

REAL for slarrrdDOUBLE PRECISION for dlarrrArray, DIMENSION (n).Contains n diagonal elements of the matrix T.

REAL for slarrreDOUBLE PRECISION for dlarrrArray, DIMENSION (n).The first (n-1) entries contain sub-diagonal elements ofthe tridiagonal matrix T; e(n) is set to 0.

Output Parameters

INTEGER.info= 0: the matrix warrants computations preserving relativeaccuracy (default value).= -1: the matrix warrants computations guaranteeing onlyabsolute accuracy.

1417


?larrvComputes the eigenvectors of the tridiagonalmatrix T = L*D* LT given L, D and the eigenvaluesof L*D* LT.

Syntax

call slarrv( n, vl, vu, d, l, pivmin, isplit, m, dol, dou, minrgp, rtol1,rtol2, w, werr, wgap, iblock, indexw, gers, z, ldz, isuppz, work, iwork, info)

call dlarrv( n, vl, vu, d, l, pivmin, isplit, m, dol, dou, minrgp, rtol1,rtol2, w, werr, wgap, iblock, indexw, gers, z, ldz, isuppz, work, iwork, info)

call clarrv( n, vl, vu, d, l, pivmin, isplit, m, dol, dou, minrgp, rtol1,rtol2, w, werr, wgap, iblock, indexw, gers, z, ldz, isuppz, work, iwork, info)

call zlarrv( n, vl, vu, d, l, pivmin, isplit, m, dol, dou, minrgp, rtol1,rtol2, w, werr, wgap, iblock, indexw, gers, z, ldz, isuppz, work, iwork, info)

Description

The routine ?larrv computes the eigenvectors of the tridiagonal matrix T = L*D* LT given L,D and approximations to the eigenvalues of L*D* LT.

The input eigenvalues should have been computed by slarre for real flavors (slarrv/clarrv)and by dlarre for double precision flavors (dlarre/zlarre).

Input Parameters

INTEGER. The order of the matrix. n ≥ 0.n

REAL for slarrv/clarrvvl, vuDOUBLE PRECISION for dlarrv/zlarrvLower and upper bounds respectively of the interval thatcontains the desired eigenvalues. vl < vu. Needed tocompute gaps on the left or right end of the extremaleigenvalues in the desired range.

REAL for slarrv/clarrvd

1418

DOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (n). On entry, the n diagonal elementsof the diagonal matrix D.

REAL for slarrv/clarrvlDOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (n).On entry, the (n-1) subdiagonal elements of the unitbidiagonal matrix L are contained in elements 1 to n-1 of Lif the matrix is not splitted. At the end of each block thecorresponding shift is stored as given by slarre for realflavors and by dlarre for double precision flavors.

REAL for slarrv/clarrvpivminDOUBLE PRECISION for dlarrv/zlarrvThe minimum pivot allowed in the Sturm sequence.

INTEGER. Array, DIMENSION (n).isplitThe splitting points, at which T breaks up into blocks. Thefirst block consists of rows/columns 1 to isplit(1), thesecond of rows/columns isplit(1)+1 through isplit(2),etc.

INTEGER. The total number of eigenvalues found.m

0 ≤ m ≤ n. If range = 'A', m = n, and if range = 'I',m = iu - il +1.

INTEGER.dol, douIf you want to compute only selected eigenvectors from allthe eigenvalues supplied, specify an index range dol:dou.Or else apply the setting dol=1, dou=m. Note that dol anddou refer to the order in which the eigenvalues are storedin w.If you want to compute only selected eigenpairs, then thecolumns dol-1 to dou+1 of the eigenvector space Z containthe computed eigenvectors. All other columns of Z are setto zero.

REAL for slarrv/clarrvminrgp, rtol1, rtol2DOUBLE PRECISION for dlarrv/zlarrvParameters for bisection. An interval [LEFT,RIGHT] hasconverged if RIGHT-LEFT.LT.MAX( rtol1*gap,rtol2*max(|LEFT|,|RIGHT|) ).

1419


REAL for slarrv/clarrvwDOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (n). The first m elements of w contain theapproximate eigenvalues for which eigenvectors are to becomputed. The eigenvalues should be grouped by split-offblock and ordered from smallest to largest within the block(the output array w from ?larre is expected here). Theseeigenvalues are set with respect to the shift of thecorresponding root representation for their block.

REAL for slarrv/clarrvwerrDOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (n). The first m elements contain thesemiwidth of the uncertainty interval of the correspondingeigenvalue in w.

REAL for slarrv/clarrvwgapDOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (n). The separation from the rightneighbor eigenvalue in w.

INTEGER. Array, DIMENSION (n).iblockThe indices of the blocks (submatrices) associated with thecorresponding eigenvalues in w; iblock(i)=1 if eigenvaluew(i) belongs to the first block from the top, =2 if w(i)belongs to the second block, etc.

INTEGER. Array, DIMENSION (n).indexwThe indices of the eigenvalues within each block (submatrix);for example, indexw(i)= 10 and iblock(i)=2 imply thatthe i-th eigenvalue w(i) is the 10-th eigenvalue in thesecond block.

REAL for slarrv/clarrvgersDOUBLE PRECISION for dlarrv/zlarrvArray, DIMENSION (2*n). The n Gerschgorin intervals (thei-th Gerschgorin interval is (gers(2*i-1), gers(2*i)).The Gerschgorin intervals should be computed from theoriginal unshifted matrix.

INTEGER. The leading dimension of the output array Z. ldz

≥ 1, and if jobz = 'V', ldz ≥ max(1,n).

ldz

1420


REAL for slarrv/clarrvworkDOUBLE PRECISION for dlarrv/zlarrvWorkspace array, DIMENSION (12*n).


Output Parameters

On exit, d may be overwritten.d

On exit, l is overwritten.l

On exit, w holds the eigenvalues of the unshifted matrix.w

On exit, werr contains refined values of its inputapproximations.

werr

On exit, wgap contains refined values of its inputapproximations. Very small gaps are changed.

wgap

REAL for slarrvzDOUBLE PRECISION for dlarrvCOMPLEX for clarrvCOMPLEX*16 for zlarrvArray, DIMENSION (ldz, max(1,m) ).If info = 0, the first m columns of z contain the orthonormaleigenvectors of the matrix T corresponding to the inputeigenvalues, with the i-th column of z holding theeigenvector associated with w(i).

NOTE. The user must ensure that at least max(1,m)columns are supplied in the array z.

INTEGER .isuppzArray, DIMENSION (2*max(1,m)). The support of theeigenvectors in z, that is, the indices indicating the nonzeroelements in z. The i-th eigenvector is nonzero only inelements isuppz(2i-1) through isuppz(2i).

INTEGER.infoIf info = 0: successful exit

1421


If info > 0: A problem occured in ?larrv. If info = 5,the Rayleigh Quotient Iteration failed to converge to fullaccuracy.If info < 0: One of the called subroutines signaled aninternal problem. Inspection of the corresponding parameterinfo for further information is required.

• If info = -1, there is a problem in ?larrb when refininga child eigenvalue;

• If info = -2, there is a problem in ?larrf whencomputing the relatively robust representation (RRR) ofa child. When a child is inside a tight cluster, it can bedifficult to find an RRR. A partial remedy from the user'spoint of view is to make the parameter minrgp smallerand recompile. However, as the orthogonality of thecomputed vectors is proportional to 1/minrgp, you shouldbe aware that you might be trading in precision whenyou decrease minrgp.

• If info = -3, there is a problem in ?larrb when refininga single eigenvalue after the Rayleigh correction wasrejected.

?lartgGenerates a plane rotation with real cosine andreal/complex sine.

Syntax

call slartg( f, g, cs, sn, r )

call dlartg( f, g, cs, sn, r )

call clartg( f, g, cs, sn, r )

call zlartg( f, g, cs, sn, r )

Description

The routine generates a plane rotation so that

1422

where cs2 + |sn|2 = 1

This is a slower, more accurate version of the BLAS Level 1 routine ?rotg, except for thefollowing differences.

For slartg/dlartg:

f and g are unchanged on return;

If g=0, then cs=1 and sn=0;

If f=0 and g ≠ 0, then cs=0 and sn=1 without doing any floating point operations (saves workin ?bdsqr when there are zeros on the diagonal);

If f exceeds g in magnitude, cs will be positive.

For clartg/zlartg:

f and g are unchanged on return;

If g=0, then cs=1 and sn=0;

If f=0, then cs=0 and sn is chosen so that r is real.

Input Parameters

REAL for slartgf, gDOUBLE PRECISION for dlartgCOMPLEX for clartgCOMPLEX*16 for zlartgThe first and second component of vector to be rotated.

Output Parameters

REAL for slartg/clartgcsDOUBLE PRECISION for dlartg/zlartgThe cosine of the rotation.

REAL for slartgsn

1423


DOUBLE PRECISION for dlartgCOMPLEX for clartgCOMPLEX*16 for zlartgThe sine of the rotation.

REAL for slartgrDOUBLE PRECISION for dlartgCOMPLEX for clartgCOMPLEX*16 for zlartgThe nonzero component of the rotated vector.

?lartvApplies a vector of plane rotations with real cosinesand real/complex sines to the elements of a pairof vectors.

Syntax

call slartv( n, x, incx, y, incy, c, s, incc )

call dlartv( n, x, incx, y, incy, c, s, incc )

call clartv( n, x, incx, y, incy, c, s, incc )

call zlartv( n, x, incx, y, incy, c, s, incc )

Description

The routine applies a vector of real/complex plane rotations with real cosines to elements ofthe real/complex vectors x and y. For i = 1,2,...,n

Input Parameters

INTEGER. The number of plane rotations to be applied.n

REAL for slartvx, y

1424


DOUBLE PRECISION for dlartvCOMPLEX for clartvCOMPLEX*16 for zlartvArrays, DIMENSION (1+(n-1)*incx) and (1+(n-1)*incy),respectively. The input vectors x and y.

INTEGER. The increment between elements of x. incx > 0.incx

INTEGER. The increment between elements of y. incy > 0.incy

REAL for slartv/clartvcDOUBLE PRECISION for dlartv/zlartvArray, DIMENSION (1+(n-1)*incc).The cosines of the plane rotations.

REAL for slartvsDOUBLE PRECISION for dlartvCOMPLEX for clartvCOMPLEX*16 for zlartvArray, DIMENSION (1+(n-1)*incc).The sines of the plane rotations.

INTEGER. The increment between elements of c and s. incc> 0.

incc

Output Parameters

The rotated vectors x and y.x, y

?laruvReturns a vector of n random real numbers froma uniform distribution.

Syntax

call slaruv( iseed, n, x )

call dlaruv( iseed, n, x )

Description

The routine ?laruv returns a vector of n random real numbers from a uniform (0,1) distribution

(n ≤ 128).

1425


This is an auxiliary routine called by ?larnv.

Input Parameters

INTEGER. Array, DIMENSION (4). On entry, the seed of therandom number generator; the array elements must bebetween 0 and 4095, and iseed(4) must be odd.

iseed

INTEGER. The number of random numbers to be generated.

n ≤ 128.

n

Output Parameters

REAL for slaruvxDOUBLE PRECISION for dlaruvArray, DIMENSION (n). The generated random numbers.

On exit, the seed is updated.seed

?larzApplies an elementary reflector (as returned by?tzrzf) to a general matrix.

Syntax

call slarz( side, m, n, l, v, incv, tau, c, ldc, work )

call dlarz( side, m, n, l, v, incv, tau, c, ldc, work )

call clarz( side, m, n, l, v, incv, tau, c, ldc, work )

call zlarz( side, m, n, l, v, incv, tau, c, ldc, work )

Description

The routine ?larz applies a real/complex elementary reflector H to a real/complex m-by-nmatrix C, from either the left or the right. H is represented in the form

H = I - tau*v*v',


If tau = 0, then H is taken to be the unit matrix.

1426


For complex flavors, to apply H '(the conjugate transpose of H), supply conjg(tau) instead oftau.

H is a product of k elementary reflectors as returned by ?tzrzf.

Input Parameters

CHARACTER*1.sideIf side = 'L': form H*CIf side = 'R': form C*H



INTEGER. The number of entries of the vector v containingthe meaningful part of the Householder vectors.

l

If side = 'L', m ≥ L ≥ 0,

if side = 'R', n ≥ L ≥ 0.

REAL for slarzvDOUBLE PRECISION for dlarzCOMPLEX for clarzCOMPLEX*16 for zlarzArray, DIMENSION (1+(l-1)*abs(incv)).The vector v in the representation of H as returned by?tzrzf.v is not used if tau = 0.

INTEGER. The increment between elements of v.incv

incv ≠ 0.

REAL for slarztauDOUBLE PRECISION for dlarzCOMPLEX for clarzCOMPLEX*16 for zlarzThe value tau in the representation of H.

REAL for slarzcDOUBLE PRECISION for dlarzCOMPLEX for clarzCOMPLEX*16 for zlarzArray, DIMENSION (ldc,n).On entry, the m-by-n matrix C.

1427



ldc ≥ max(1,m).

REAL for slarzworkDOUBLE PRECISION for dlarzCOMPLEX for clarzCOMPLEX*16 for zlarzWorkspace array, DIMENSION(n) if side = 'L' or(m) if side = 'R'.

Output Parameters


c

?larzbApplies a block reflector or itstranspose/conjugate-transpose to a general matrix.

Syntax

call slarzb( side, trans, direct, storev, m, n, k, l, v, ldv, t, ldt, c, ldc,work, ldwork )

call dlarzb( side, trans, direct, storev, m, n, k, l, v, ldv, t, ldt, c, ldc,work, ldwork )

call clarzb( side, trans, direct, storev, m, n, k, l, v, ldv, t, ldt, c, ldc,work, ldwork )

call zlarzb( side, trans, direct, storev, m, n, k, l, v, ldv, t, ldt, c, ldc,work, ldwork )

Description

The routine applies a real/complex block reflector H or its transpose HT (or H for complex flavors)to a real/complex distributed m-by-n matrix C from the left or the right. Currently, only storev= 'R' and direct = 'B' are supported.

1428


Input Parameters

CHARACTER*1.sideIf side = 'L': apply H or H' from the leftIf side = 'R': apply H or H' from the right

CHARACTER*1.transIf trans = 'N': apply H (No transpose)If trans='C': apply H' (Transpose/conjugate transpose)

CHARACTER*1.directIndicates how H is formed from a product of elementaryreflectors= 'F': H = H(1) H(2)... H(k) (forward, not supported)= 'B': H = H(k)... H(2) H(1) (backward)

CHARACTER*1.storevIndicates how the vectors which define the elementaryreflectors are stored:= 'C': Column-wise (not supported)= 'R': Row-wise.



INTEGER. The order of the matrix T (equal to the numberof elementary reflectors whose product defines the blockreflector).

k

INTEGER. The number of columns of the matrix V containingthe meaningful part of the Householder reflectors.

l

If side = 'L', m ≥ l ≥ 0, if side = 'R', n ≥ l ≥ 0.

REAL for slarzbvDOUBLE PRECISION for dlarzbCOMPLEX for clarzbCOMPLEX*16 for zlarzbArray, DIMENSION (ldv, nv).If storev = 'C', nv = k;if storev = 'R', nv = l.


If storev = 'C', ldv ≥ l; if storev = 'R', ldv ≥ k.

1429


REAL for slarzbtDOUBLE PRECISION for dlarzbCOMPLEX for clarzbCOMPLEX*16 for zlarzbArray, DIMENSION (ldt,k). The triangular k-by-k matrix Tin the representation of the block reflector.

INTEGER. The leading dimension of the array t.ldt

ldt ≥ k.

REAL for slarzbcDOUBLE PRECISION for dlarzbCOMPLEX for clarzbCOMPLEX*16 for zlarzbArray, DIMENSION (ldc,n). On entry, the m-by-n matrix C.


ldc ≥ max(1,m).

REAL for slarzbworkDOUBLE PRECISION for dlarzbCOMPLEX for clarzbCOMPLEX*16 for zlarzbWorkspace array, DIMENSION (ldwork, k).

INTEGER. The leading dimension of the array work.ldwork

If side = 'L', ldwork ≥ max(1, n);

if side = 'R', ldwork ≥ max(1, m).

Output Parameters

On exit, C is overwritten by H*C, or H'*C, or C*H, or C*H'.c

1430


?larztForms the triangular factor T of a block reflector H= I - V*T*VH.

Syntax

call slarzt( direct, storev, n, k, v, ldv, tau, t, ldt )

call dlarzt( direct, storev, n, k, v, ldv, tau, t, ldt )

call clarzt( direct, storev, n, k, v, ldv, tau, t, ldt )

call zlarzt( direct, storev, n, k, v, ldv, tau, t, ldt )

Description

The routine forms the triangular factor T of a real/complex block reflector H of order > n, whichis defined as a product of k elementary reflectors.

If direct = 'F', H = H(1) H(2) . . . H(k) and T is upper triangular.


If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-thcolumn of the array v, and H = I - V*T*V'

If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-throw of the array v, and H = I - V'*T*V

Currently, only storev = 'R' and direct = 'B' are supported.

Input Parameters

CHARACTER*1.directSpecifies the order in which the elementary reflectors aremultiplied to form the block reflector:If direct = 'F': H = H(1) H(2) . . . H(k) (forward,not supported)If direct = 'B': H = H(k) . . . H(2) H(1) (backward)

CHARACTER*1.storevSpecifies how the vectors which define the elementaryreflectors are stored (see also Application Notes below):If storev = 'C': column-wise (not supported)If storev = 'R': row-wise

1431


INTEGER. The order of the block reflector H. n ≥ 0.n

INTEGER. The order of the triangular factor T (equal to the

number of elementary reflectors). k ≥ 1.

k

REAL for slarztvDOUBLE PRECISION for dlarztCOMPLEX for clarztCOMPLEX*16 for zlarztArray, DIMENSION(ldv, k) if storev = 'C'(ldv, n) if storev = 'R' The matrix V.


If storev = 'C', ldv ≥ max(1,n);


REAL for slarzttauDOUBLE PRECISION for dlarztCOMPLEX for clarztCOMPLEX*16 for zlarztArray, DIMENSION (k). tau(i) must contain the scalar factorof the elementary reflector H(i).

INTEGER. The leading dimension of the output array t.ldt

ldt ≥ k.

Output Parameters

REAL for slarzttDOUBLE PRECISION for dlarztCOMPLEX for clarztCOMPLEX*16 for zlarztArray, DIMENSION (ldt,k). The k-by-k triangular factor Tof the block reflector. If direct = 'F', T is uppertriangular; if direct = 'B', T is lower triangular. The restof the array is not used.

The matrix V. See Application Notes below.v

1432


Application Notes


1433


?las2Computes singular values of a 2-by-2 triangularmatrix.

Syntax

call slas2( f, g, h, ssmin, ssmax )

call dlas2( f, g, h, ssmin, ssmax )

Description

The routine ?las2 computes the singular values of the 2-by-2 matrix

1434


On return, ssmin is the smaller singular value and SSMAX is the larger singular value.

Input Parameters

REAL for slas2f, g, hDOUBLE PRECISION for dlas2The (1,1), (1,2) and (2,2) elements of the 2-by-2 matrix,respectively.

Output Parameters

REAL for slas2ssmin, ssmaxDOUBLE PRECISION for dlas2The smaller and the larger singular values, respectively.

Application Notes

Barring over/underflow, all output quantities are correct to within a few units in the last place(ulps), even in the absence of a guard digit in addition/subtraction. In ieee arithmetic, thecode works correctly if one matrix element is infinite. Overflow will not occur unless the largestsingular value itself overflows, or is within a few ulps of overflow. (On machines with partialoverflow, like the Cray, overflow may occur if the largest singular value is within a factor of 2of overflow.) Underflow is harmless if underflow is gradual. Otherwise, results may correspondto a matrix modified by perturbations of size near the underflow threshold.

1435


?lasclMultiplies a general rectangular matrix by a realscalar defined as cto/cfrom.

Syntax

call slascl( type, kl, ku, cfrom, cto, m, n, a, lda, info )

call dlascl( type, kl, ku, cfrom, cto, m, n, a, lda, info )

call clascl( type, kl, ku, cfrom, cto, m, n, a, lda, info )

call zlascl( type, kl, ku, cfrom, cto, m, n, a, lda, info )

Description

The routine ?lascl multiplies the m-by-n real/complex matrix A by the real scalar cto/cfrom.The operation is performed without over/underflow as long as the final result cto*A(i,j)/cfromdoes not over/underflow.

type specifies that A may be full, upper triangular, lower triangular, upper Hessenberg, orbanded.

Input Parameters

CHARACTER*1. type indices the storage type of the inputmatrix.

type

= 'G': A is a full matrix.= 'L': A is a lower triangular matrix.= 'U': A is an upper triangular matrix.= 'H': A is an upper Hessenberg matrix.= 'B': A is a symmetric band matrix with lower bandwidthkl and upper bandwidth ku and with the only the lower halfstored= 'Q': A is a symmetric band matrix with lower bandwidthkl and upper bandwidth ku and with the only the upper halfstored.= 'Z': A is a band matrix with lower bandwidth kl andupper bandwidth ku.

INTEGER. The lower bandwidth of A. Referenced only if type= 'B', 'Q' or 'Z'.

kl

1436


INTEGER. The upper bandwidth of A. Referenced only if type= 'B', 'Q' or 'Z'.

ku

REAL for slascl/clasclcfrom, ctoDOUBLE PRECISION for dlascl/zlasclThe matrix A is multiplied by cto/cfrom. A(i,j) is computedwithout over/underflow if the final result cto*A(i,j)/cfromcan be represented without over/underflow. cfrom must benonzero.



REAL for slasclaDOUBLE PRECISION for dlasclCOMPLEX for clasclCOMPLEX*16 for zlasclArray, DIMENSION (lda, n). The matrix to be multiplied bycto/cfrom. See type for the storage type.


lda ≥ max(1,m).

Output Parameters

The multiplied matrix A.a

INTEGER.infoIf info = 0 - successful exitIf info = -i < 0, the i-th argument had an illegal value.

?lasd0Computes the singular values of a real upperbidiagonal n-by-m matrix B with diagonal d andoff-diagonal e. Used by ?bdsdc.

Syntax

call slasd0( n, sqre, d, e, u, ldu, vt, ldvt, smlsiz, iwork, work, info )

call dlasd0( n, sqre, d, e, u, ldu, vt, ldvt, smlsiz, iwork, work, info )

1437

Description

Using a divide and conquer approach, the routine ?lasd0 computes the singular valuedecomposition (SVD) of a real upper bidiagonal n-by-m matrix B with diagonal d and offdiagonale, where m = n + sqre.

The algorithm computes orthogonal matrices U and VT such that B = U*S*VT. The singularvalues S are overwritten on d.

The related subroutine ?lasda computes only the singular values, and optionally, the singularvectors in compact form.

Input Parameters

INTEGER. On entry, the row dimension of the upperbidiagonal matrix. This is also the dimension of the maindiagonal array d.

n

INTEGER. Specifies the column dimension of the bidiagonalmatrix.

sqre

If sqre = 0: the bidiagonal matrix has column dimensionm = n.If sqre = 1: the bidiagonal matrix has column dimensionm = n+1.

REAL for slasd0dDOUBLE PRECISION for dlasd0Array, DIMENSION (n). On entry, d contains the maindiagonal of the bidiagonal matrix.

REAL for slasd0eDOUBLE PRECISION for dlasd0Array, DIMENSION (m-1). Contains the subdiagonal entriesof the bidiagonal matrix. On exit, e is destroyed.

INTEGER. On entry, leading dimension of the output arrayu.

ldu

INTEGER. On entry, leading dimension of the output arrayvt.

ldvt

INTEGER. On entry, maximum size of the subproblems atthe bottom of the computation tree.

smlsiz

INTEGER.iworkWorkspace array, dimension must be at least (8 * n).

1438


REAL for slasd0workDOUBLE PRECISION for dlasd0Workspace array, dimension must be at least (3 * m2 +2 *m).

Output Parameters

On exit d, If info = 0, contains singular values of thebidiagonal matrix.

d

REAL for slasd0uDOUBLE PRECISION for dlasd0Array, DIMENSION at least (ldq, n). On exit, u contains theleft singular vectors.

REAL for slasd0vtDOUBLE PRECISION for dlasd0Array, DIMENSION at least (ldvt, m). On exit, vt' containsthe right singular vectors.

INTEGER.infoIf info = 0: successful exit.If info = -i < 0, the i-th argument had an illegal value.If info = 1, an singular value did not converge.

?lasd1Computes the SVD of an upper bidiagonal matrixB of the specified size. Used by ?bdsdc.

Syntax

call slasd1( nl, nr, sqre, d, alpha, beta, u, ldu, vt, ldvt, idxq, iwork,work, info )

call dlasd1( nl, nr, sqre, d, alpha, beta, u, ldu, vt, ldvt, idxq, iwork,work, info )

Description

This routine computes the SVD of an upper bidiagonal n-by-m matrix B, where n = nl + nr+ 1 and m = n + sqre.

The routine ?lasd1 is called from ?lasd0.

1439

A related subroutine ?lasd7 handles the case in which the singular values (and the singularvectors in factored form) are desired.

?lasd1 computes the SVD as follows:

= U(out)*(D(out) 0)*VT(out)

where Z' = (Z1' a Z2' b) = u' VT ', and u is a vector of dimension m with alpha and beta inthe nl+1 and nl+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.

The left singular vectors of the original matrix are stored in u, and the transpose of the rightsingular vectors are stored in vt, and the singular values are in d. The algorithm consists ofthree stages:

1. The first stage consists of deflating the size of the problem when there are multiple singularvalues or when there are zeros in the Z vector. For each such occurrence the dimension ofthe secular equation problem is reduced by one. This stage is performed by the routine?lasd2.

2. The second stage consists of calculating the updated singular values. This is done by findingthe square roots of the roots of the secular equation via the routine ?lasd4 (as called by?lasd3). This routine also calculates the singular vectors of the current problem.

3. The final stage consists of computing the updated singular vectors directly using the updatedsingular values. The singular vectors for the current problem are multiplied with the singularvectors from the overall problem.

Input Parameters


nl ≥ 1.


nr ≥ 1.

1440


INTEGER.sqreIf sqre = 0: the lower block is an nr-by-nr square matrix.If sqre = 1: the lower block is an nr-by-(nr+1) rectangularmatrix. The bidiagonal matrix has row dimension n = nl+ nr + 1, and column dimension m = n + sqre.

REAL for slasd1dDOUBLE PRECISION for dlasd1Array, DIMENSION (nl+nr+1). n = nl+nr+1. On entryd(1:nl,1:nl) contains the singular values of the upperblock; and d(nl+2:n) contains the singular values of thelower block.

REAL for slasd1alphaDOUBLE PRECISION for dlasd1Contains the diagonal element associated with the addedrow.

REAL for slasd1betaDOUBLE PRECISION for dlasd1Contains the off-diagonal element associated with the addedrow.

REAL for slasd1uDOUBLE PRECISION for dlasd1Array, DIMENSION (ldu, n). On entry u(1:nl, 1:nl)contains the left singular vectors of the upper block;u(nl+2:n, nl+2:n) contains the left singular vectors ofthe lower block.

INTEGER. The leading dimension of the array U.ldu

ldu ≥ max(1, n).

REAL for slasd1vtDOUBLE PRECISION for dlasd1Array, DIMENSION (ldvt, m), where m = n + sqre.On entry vt(1:nl+1, 1:nl+1)' contains the right singularvectors of the upper block; vt(nl+2:m, nl+2:m)' containsthe right singular vectors of the lower block.

INTEGER. The leading dimension of the array vt.ldvt

ldvt ≥ max(1, M).

INTEGER.iwork

1441


Workspace array, DIMENSION (4n).

REAL for slasd1workDOUBLE PRECISION for dlasd1Workspace array, DIMENSION (3m2 + 2m).

Output Parameters

On exit d(1:n) contains the singular values of the modifiedmatrix.

d

On exit, the diagonal element associated with the addedrow deflated by max( abs( alpha ), abs( beta ),abs( D(I) ) ), I = 1,n.

alpha

On exit, the off-diagonal element associated with the addedrow deflated by max( abs( alpha ), abs( beta ),abs( D(I) ) ), I = 1,n.

beta

On exit u contains the left singular vectors of the bidiagonalmatrix.

u

On exit vt' contains the right singular vectors of thebidiagonal matrix.

vt

INTEGERidxqArray, DIMENSION (n). Contains the permutation which willreintegrate the subproblem just solved back into sortedorder, that is, d(idxq( i = 1, n )) will be in ascendingorder.

INTEGER.infoIf info = 0: successful exit.If info = -i < 0, the i-th argument had an illegal value.If info = 1, an singular value did not converge.

1442

?lasd2Merges the two sets of singular values togetherinto a single sorted set. Used by ?bdsdc.

Syntax

call slasd2( nl, nr, sqre, k, d, z, alpha, beta, u, ldu, vt, ldvt, dsigma,u2, ldu2, vt2, ldvt2, idxp, idx, idxp, idxq, coltyp, info )

call dlasd2( nl, nr, sqre, k, d, z, alpha, beta, u, ldu, vt, ldvt, dsigma,u2, ldu2, vt2, ldvt2, idxp, idx, idxp, idxq, coltyp, info )

Description

The routine ?lasd2 merges the two sets of singular values together into a single sorted set.Then it tries to deflate the size of the problem. There are two ways in which deflation can occur:when two or more singular values are close together or if there is a tiny entry in the Z vector.For each such occurrence the order of the related secular equation problem is reduced by one.


Input Parameters


nl ≥ 1.


nr ≥ 1.

INTEGER.sqreIf sqre = 0): the lower block is an nr-by-nr square matrixIf sqre = 1): the lower block is an nr-by-(nr+1)rectangular matrix. The bidiagonal matrix has n = nl +

nr + 1 rows and m = n + sqre ≥ n columns.

REAL for slasd2dDOUBLE PRECISION for dlasd2Array, DIMENSION (n). On entry d contains the singularvalues of the two submatrices to be combined.

REAL for slasd2alphaDOUBLE PRECISION for dlasd2

1443


Contains the diagonal element associated with the addedrow.


REAL for slasd2uDOUBLE PRECISION for dlasd2Array, DIMENSION (ldu, n). On entry u contains the leftsingular vectors of two submatrices in the two square blockswith corners at (1,1), (nl, nl), and (nl+2, nl+2), (n,n).

INTEGER. The leading dimension of the array u.ldu

ldu ≥ n.

INTEGER. The leading dimension of the output array u2.

ldu2 ≥ n.

ldu2

REAL for slasd2vtDOUBLE PRECISION for dlasd2Array, DIMENSION (ldvt, m). On entry, vt' contains the rightsingular vectors of two submatrices in the two square blockswith corners at (1,1), (nl+1, nl+1), and (nl+2, nl+2), (m,m).

INTEGER. The leading dimension of the array vt. ldvt ≥m.

ldvt

INTEGER. The leading dimension of the output array vt2.

ldvt2 ≥ m.

ldvt2

INTEGER.idxpWorkspace array, DIMENSION (n). This will contain thepermutation used to place deflated values of D at the endof the array. On output idxp(2:k) points to the nondeflatedd-values and idxp(k+1:n) points to the deflated singularvalues.

INTEGER.idxWorkspace array, DIMENSION (n). This will contain thepermutation used to sort the contents of d into ascendingorder.

1444


INTEGER.coltypWorkspace array, DIMENSION (n). As workspace, this arraycontains a label that indicates which of the following typesa column in the u2 matrix or a row in the vt2 matrix is:1 : non-zero in the upper half only2 : non-zero in the lower half only3 : dense4 : deflated.

INTEGER. Array, DIMENSION (n). This parameter containsthe permutation that separately sorts the two sub-problemsin D into ascending order. Note that entries in the first half

idxq

of this permutation must first be moved one positionbackward; and entries in the second half must first havenl+1 added to their values.

Output Parameters

INTEGER. Contains the dimension of the non-deflated matrix,


k

On exit D contains the trailing (n-k) updated singular values(those which were deflated) sorted into increasing order.

d

On exit u contains the trailing (n-k) updated left singularvectors (those which were deflated) in its last n-k columns.

u

REAL for slasd2zDOUBLE PRECISION for dlasd2Array, DIMENSION (n). On exit, z contains the updating rowvector in the secular equation.

REAL for slasd2dsigmaDOUBLE PRECISION for dlasd2Array, DIMENSION (n). Contains a copy of the diagonalelements (k-1 singular values and one zero) in the secularequation.

REAL for slasd2u2DOUBLE PRECISION for dlasd2

1445


Array, DIMENSION (ldu2, n). Contains a copy of the firstk-1 left singular vectors which will be used by ?lasd3 in amatrix multiply (?gemm) to solve for the new left singularvectors. u2 is arranged into four blocks. The first blockcontains a column with 1 at nl+1 and zero everywhere else;the second block contains non-zero entries only at and abovenl; the third contains non-zero entries only below nl+1;and the fourth is dense.

On exit, vt' contains the trailing (n-k) updated right singularvectors (those which were deflated) in its last n-k columns.In case sqre =1, the last row of vt spans the right nullspace.

vt

REAL for slasd2vt2DOUBLE PRECISION for dlasd2Array, DIMENSION (ldvt2, n). vt2' contains a copy of thefirst k right singular vectors which will be used by ?lasd3in a matrix multiply (?gemm) to solve for the new rightsingular vectors. vt2 is arranged into three blocks. The firstblock contains a row that corresponds to the special 0diagonal element in sigma; the second block containsnon-zeros only at and before nl +1; the third block containsnon-zeros only at and after nl +2.

INTEGER. Array, DIMENSION (n). This will contain thepermutation used to arrange the columns of the deflated umatrix into three groups: the first group contains non-zeroentries only at and above nl, the second contains non-zeroentries only below nl+2, and the third is dense.

idxc

On exit, it is an array of dimension 4, with coltyp(i) beingthe dimension of the i-th type columns.

coltyp

INTEGER.infoIf info = 0): successful exitIf info = -i < 0, the i-th argument had an illegal value.

1446

?lasd3Finds all square roots of the roots of the secularequation, as defined by the values in D and Z, andthen updates the singular vectors by matrixmultiplication. Used by ?bdsdc.

Syntax

call slasd3( nl, nr, sqre, k, d, q, ldq, dsigma, u, ldu, u2, ldu2, vt, ldvt,vt2, ldvt2, idxc, ctot, z, info )

call dlasd3( nl, nr, sqre, k, d, q, ldq, dsigma, u, ldu, u2, ldu2, vt, ldvt,vt2, ldvt2, idxc, ctot, z, info )

Description

The routine ?lasd3 finds all the square roots of the roots of the secular equation, as definedby the values in D and Z.

It makes the appropriate calls to ?lasd4 and then updates the singular vectors by matrixmultiplication.


Input Parameters


nl ≥ 1.


nr ≥ 1.

INTEGER.sqreIf sqre = 0): the lower block is an nr-by-nr square matrix.If sqre = 1): the lower block is an nr-by-(nr+1)rectangular matrix. The bidiagonal matrix has n = nl +

nr + 1 rows and m = n + sqre ≥ n columns.

INTEGER.The size of the secular equation, 1 ≤ k ≤ n.k

REAL for slasd3qDOUBLE PRECISION for dlasd3Workspace array, DIMENSION at least (ldq, k).

1447


INTEGER. The leading dimension of the array Q.ldq

ldq ≥ k.

REAL for slasd3dsigmaDOUBLE PRECISION for dlasd3Array, DIMENSION (k). The first k elements of this arraycontain the old roots of the deflated updating problem. Theseare the poles of the secular equation.

INTEGER. The leading dimension of the array u.ldu

ldu ≥ n.

REAL for slasd3u2DOUBLE PRECISION for dlasd3Array, DIMENSION (ldu2, n).The first k columns of this matrix contain the non-deflatedleft singular vectors for the split problem.

INTEGER. The leading dimension of the array u2.ldu2

ldu2 ≥ n.

INTEGER. The leading dimension of the array vt.ldvt

ldvt ≥ n.

REAL for slasd3vt2DOUBLE PRECISION for dlasd3Array, DIMENSION (ldvt2, n).The first k columns of vt2' contain the non-deflated rightsingular vectors for the split problem.

INTEGER. The leading dimension of the array vt2.ldvt2

ldvt2 ≥ n.

INTEGER. Array, DIMENSION (n).idxcThe permutation used to arrange the columns of u (androws of vt) into three groups: the first group containsnon-zero entries only at and above (or before) nl +1; thesecond contains non-zero entries only at and below (or after)nl+2; and the third is dense. The first column of u and therow of vt are treated separately, however. The rows of thesingular vectors found by ?lasd4 must be likewise permutedbefore the matrix multiplies can take place.

1448


INTEGER. Array, DIMENSION (4). A count of the total numberof the various types of columns in u (or rows in vt), asdescribed in idxc.

ctot

The fourth column type is any column which has beendeflated.

REAL for slasd3zDOUBLE PRECISION for dlasd3Array, DIMENSION (k). The first k elements of this arraycontain the components of the deflation-adjusted updatingrow vector.

Output Parameters

REAL for slasd3dDOUBLE PRECISION for dlasd3Array, DIMENSION (k). On exit the square roots of the rootsof the secular equation, in ascending order.

REAL for slasd3uDOUBLE PRECISION for dlasd3Array, DIMENSION (ldu, n).The last n - k columns of this matrix contain the deflatedleft singular vectors.

REAL for slasd3vtDOUBLE PRECISION for dlasd3Array, DIMENSION (ldvt, m).The last m - k columns of vt' contain the deflated rightsingular vectors.

Destroyed on exit.vt2

Destroyed on exit.z

INTEGER.infoIf info = 0): successful exit.If info = -i < 0, the i-th argument had an illegal value.If info = 1, an singular value did not converge.

1449

Application Notes

This code makes very mild assumptions about floating point arithmetic. It will work on machineswith a guard digit in add/subtract, or on those binary machines without guard digits whichsubtract like the Cray XMP, Cray YMP, Cray C 90, or Cray 2. It could conceivably fail onhexadecimal or decimal machines without guard digits, but we know of none.

?lasd4Computes the square root of the i-th updatedeigenvalue of a positive symmetric rank-onemodification to a positive diagonal matrix. Used by?bdsdc.

Syntax

call slasd4( n, i, d, z, delta, rho, sigma, work, info )

call dlasd4( n, i, d, z, delta, rho, sigma, work, info )

Description

This routine computes the square root of the i-th updated eigenvalue of a positive symmetricrank-one modification to a positive diagonal matrix whose entries are given as the squares of

the corresponding entries in the array d, and that 0 ≤ d(i) < d(j) for i < j and that rho> 0. This is arranged by the calling routine, and is no loss in generality. The rank-one modifiedsystem is thus

diag( d )* diag( d ) + rho*Z*Z_transpose,

where we assume the Euclidean norm of Z is 1.The method consists of approximating therational functions in the secular equation by simpler interpolating rational functions.

Input Parameters

INTEGER. The length of all arrays.n

INTEGER.i

The index of the eigenvalue to be computed. 1 ≤ i ≤ n.

REAL for slasd4dDOUBLE PRECISION for dlasd4Array, DIMENSION (n).

1450

The original eigenvalues. It is assumed that they are in

order, 0 ≤ d(i) < d(j) for i < j.

REAL for slasd4zDOUBLE PRECISION for dlasd4Array, DIMENSION (n).The components of the updating vector.

REAL for slasd4rhoDOUBLE PRECISION for dlasd4The scalar in the symmetric updating formula.

REAL for slasd4workDOUBLE PRECISION for dlasd4Workspace array, DIMENSION (n ).

If n ≠ 1, work contains (d(j) + sigma_i) in its j-thcomponent.If n = 1, then work( 1 ) = 1.

Output Parameters

REAL for slasd4deltaDOUBLE PRECISION for dlasd4Array, DIMENSION (n).

If n ≠ 1, delta contains (d(j) - sigma_i) in its j-thcomponent.If n = 1, then delta (1) = 1. The vector delta containsthe information necessary to construct the (singular)eigenvectors.

REAL for slasd4sigmaDOUBLE PRECISION for dlasd4The computed sigma_i, the i-th updated eigenvalue.

INTEGER.info= 0: successful exit> 0: If info = 1, the updating process failed.

1451

?lasd5Computes the square root of the i-th eigenvalueof a positive symmetric rank-one modification ofa 2-by-2 diagonal matrix.Used by ?bdsdc.

Syntax

call slasd5( i, d, z, delta, rho, dsigma, work )

call dlasd5( i, d, z, delta, rho, dsigma, work )

Description

This routine computes the square root of the i-th eigenvalue of a positive symmetric rank-onemodification of a 2-by-2 diagonal matrix diag( d )* diag( d ) + rho*Z*Z_transpose

The diagonal entries in the array d are assumed to satisfy 0 ≤ d(i) < d(j) for i < j .Wealso assume rho > 0 and that the Euclidean norm of the vector Z is one.

Input Parameters

INTEGER.The index of the eigenvalue to be computed. i =1 or i = 2.

i

REAL for slasd5dDOUBLE PRECISION for dlasd5Array, dimension (2 ).

The original eigenvalues. We assume 0 ≤ d(1) < d(2).

REAL for slasd5zDOUBLE PRECISION for dlasd5Array, dimension ( 2 ).The components of the updating vector.

REAL for slasd5rhoDOUBLE PRECISION for dlasd5The scalar in the symmetric updating formula.

REAL for slasd5workDOUBLE PRECISION for dlasd5.Workspace array, dimension ( 2 ). Contains (d(j) +sigma_i) in its j-th component.

1452

Output Parameters

REAL for slasd5deltaDOUBLE PRECISION for dlasd5.Array, dimension ( 2 ).Contains (d(j) - sigma_i) in its j-th component. Thevector delta contains the information necessary to constructthe eigenvectors.

REAL for slasd5dsigmaDOUBLE PRECISION for dlasd5.The computed sigma_i , the i-th updated eigenvalue.

?lasd6Computes the SVD of an updated upper bidiagonalmatrix obtained by merging two smaller ones byappending a row. Used by ?bdsdc.

Syntax

call slasd6( icompq, nl, nr, sqre, d, vf, vl, alpha, beta, idxq, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, work, iwork,info )

call dlasd6( icompq, nl, nr, sqre, d, vf, vl, alpha, beta, idxq, perm, givptr,givcol, ldgcol, givnum, ldgnum, poles, difl, difr, z, k, c, s, work, iwork,info )

Description

The routine ?lasd6 computes the SVD of an updated upper bidiagonal matrix B obtained bymerging two smaller ones by appending a row. This routine is used only for the problem whichrequires all singular values and optionally singular vector matrices in factored form. B is ann-by-m matrix with n = nl + nr + 1 and m = n + sqre. A related subroutine, ?lasd1,handles the case in which all singular values and singular vectors of the bidiagonal matrix aredesired. ?lasd6 computes the SVD as follows:

1453


= U(out*(D(out)*VT(out)

where Z' = (Z1' a Z2' b) = u' VT', and u is a vector of dimension m with alpha and beta inthe nl+1 and nl+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.

The singular values of B can be computed using D1, D2, the first components of all the rightsingular vectors of the lower block, and the last components of all the right singular vectors ofthe upper block. These components are stored and updated in vf and vl, respectively, in?lasd6. Hence U and VT are not explicitly referenced.

The singular values are stored in D. The algorithm consists of two stages:

1. The first stage consists of deflating the size of the problem when there are multiple singularvalues or if there is a zero in the Z vector. For each such occurrence the dimension of thesecular equation problem is reduced by one. This stage is performed by the routine ?lasd7.

2. The second stage consists of calculating the updated singular values. This is done by findingthe roots of the secular equation via the routine ?lasd4 (as called by ?lasd8). This routinealso updates vf and vl and computes the distances between the updated singular valuesand the old singular values. ?lasd6 is called from ?lasda.

Input Parameters

INTEGER. Specifies whether singular vectors are to becomputed in factored form:

icompq

= 0: Compute singular values only= 1: Compute singular vectors in factored form as well.


nl ≥ 1.


nr ≥ 1.

INTEGER .sqre= 0: the lower block is an nr-by-nr square matrix.

1454


= 1: the lower block is an nr-by-(nr+1) rectangular matrix.The bidiagonal matrix has row dimension n=nl+nr+1, andcolumn dimension m = n + sqre.

REAL for slasd6dDOUBLE PRECISION for dlasd6Array, dimension ( nl+nr+1 ). On entry d(1:nl,1:nl)contains the singular values of the upper block, andd(nl+2:n) contains the singular values of the lower block.

REAL for slasd6vfDOUBLE PRECISION for dlasd6Array, dimension ( m ).On entry, vf(1:nl+1) contains the first components of allright singular vectors of the upper block; and vf(nl+2:m)contains the first components of all right singular vectorsof the lower block.

REAL for slasd6vlDOUBLE PRECISION for dlasd6Array, dimension ( m ).On entry, vl(1:nl+1) contains the last components of allright singular vectors of the upper block; and vl(nl+2:m)contains the last components of all right singular vectors ofthe lower block.

REAL for slasd6alphaDOUBLE PRECISION for dlasd6Contains the diagonal element associated with the addedrow.


INTEGER.The leading dimension of the output array givcol,must be at least n.

ldgcol

INTEGER.ldgnumThe leading dimension of the output arrays givnum andpoles, must be at least n.

REAL for slasd6workDOUBLE PRECISION for dlasd6

1455


Workspace array, dimension ( 4m ).

INTEGER.iwork

Workspace array, dimension ( 3n ).

Output Parameters

On exit d(1:n) contains the singular values of the modifiedmatrix.

d

On exit, vf contains the first components of all right singularvectors of the bidiagonal matrix.

vf

On exit, vl contains the last components of all right singularvectors of the bidiagonal matrix.

vl

On exit, the diagonal element associated with the addedrow deflated by max( abs( alpha ), abs( beta ),abs( D(I) ) ), I = 1,n.

alpha

On exit, the off-diagonal element associated with the addedrow deflated by max( abs( alpha ), abs( beta ),abs( D(I) ) ), I = 1,n.

beta

INTEGER.idxqArray, dimension (n). This contains the permutation whichwill reintegrate the subproblem just solved back into sortedorder, that is, d( idxq( i = 1, n ) ) will be in ascendingorder.

INTEGER.perm

Array, dimension (n). The permutations (from deflationand sorting) to be applied to each block. Not referenced ificompq = 0.

INTEGER. The number of Givens rotations which took placein this subproblem. Not referenced if icompq = 0.

givptr

INTEGER.givcol

Array, dimension ( ldgcol, 2 ). Each pair of numbersindicates a pair of columns to take place in a Givens rotation.Not referenced if icompq = 0.

REAL for slasd6givnumDOUBLE PRECISION for dlasd6

1456


Array, dimension ( ldgnum, 2 ). Each number indicates theC or S value to be used in the corresponding Givens rotation.Not referenced if icompq = 0.

REAL for slasd6polesDOUBLE PRECISION for dlasd6Array, dimension ( ldgnum, 2 ). On exit, poles(1,*) is anarray containing the new singular values obtained fromsolving the secular equation, and poles(2,*) is an arraycontaining the poles in the secular equation. Not referencedif icompq = 0.

REAL for slasd6diflDOUBLE PRECISION for dlasd6Array, dimension (n). On exit, difl(i) is the distancebetween i-th updated (undeflated) singular value and thei-th (undeflated) old singular value.

REAL for slasd6difrDOUBLE PRECISION for dlasd6Array, dimension (ldgnum, 2 ) if icompq = 1 anddimension (n) if icompq = 0.On exit, difr(i, 1) is the distance between i-th updated(undeflated) singular value and the i+1-th (undeflated) oldsingular value. If icompq = 1, difr(1: k, 2) is an arraycontaining the normalizing factors for the right singularvector matrix.See ?lasd8 for details on difl and difr.

REAL for slasd6zDOUBLE PRECISION for dlasd6Array, dimension ( m ).The first elements of this array contain the components ofthe deflation-adjusted updating row vector.

INTEGER. Contains the dimension of the non-deflated matrix.


k

REAL for slasd6cDOUBLE PRECISION for dlasd6c contains garbage if sqre =0 and the C-value of a Givensrotation related to the right null space if

1457


sqre = 1.

REAL for slasd6sDOUBLE PRECISION for dlasd6s contains garbage if sqre =0 and the S-value of a Givensrotation related to the right null space ifsqre = 1.

INTEGER.info

= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value.> 0: if info = 1, an singular value did not converge

?lasd7Merges the two sets of singular values togetherinto a single sorted set. Then it tries to deflate thesize of the problem. Used by ?bdsdc.

Syntax

call slasd7( icompq, nl, nr, sqre, k, d, z, zw, vf, vfw, vl, vlw, alpha, beta,dsigma, idx, idxp, idxq, perm, givptr, givcol, ldgcol, givnum, ldgnum, c, s,info )

call dlasd7( icompq, nl, nr, sqre, k, d, z, zw, vf, vfw, vl, vlw, alpha, beta,dsigma, idx, idxp, idxq, perm, givptr, givcol, ldgcol, givnum, ldgnum, c, s,info )

Description

The routine ?lasd7 merges the two sets of singular values together into a single sorted set.Then it tries to deflate the size of the problem. There are two ways in which deflation can occur:when two or more singular values are close together or if there is a tiny entry in the Z vector.For each such occurrence the order of the related secular equation problem is reduced by one.?lasd7 is called from ?lasd6.

Input Parameters

INTEGER. Specifies whether singular vectors are to becomputed in compact form, as follows:

icompq

= 0: Compute singular values only.

1458

= 1: Compute singular vectors of upper bidiagonal matrixin compact form.


nl ≥ 1.


nr ≥ 1.

INTEGER.sqre= 0: the lower block is an nr-by-nr square matrix.= 1: the lower block is an nr-by-(nr+1) rectangular matrix.The bidiagonal matrix has n = nl + nr + 1 rows and m

= n + sqre ≥ n columns.

REAL for slasd7dDOUBLE PRECISION for dlasd7Array, DIMENSION (n). On entry d contains the singularvalues of the two submatrices to be combined.

REAL for slasd7zwDOUBLE PRECISION for dlasd7Array, DIMENSION ( m ).Workspace for z.

REAL for slasd7vfDOUBLE PRECISION for dlasd7Array, DIMENSION ( m ). On entry, vf(1:nl+1) containsthe first components of all right singular vectors of the upperblock; and vf(nl+2:m) contains the first components of allright singular vectors of the lower block.

REAL for slasd7vfwDOUBLE PRECISION for dlasd7Array, DIMENSION ( m ).Workspace for vf.

REAL for slasd7vlDOUBLE PRECISION for dlasd7Array, DIMENSION ( m ).

1459


On entry, vl(1:nl+1) contains the last components of allright singular vectors of the upper block; and vl(nl+2:m)contains the last components of all right singular vectors ofthe lower block.

REAL for slasd7VLWDOUBLE PRECISION for dlasd7Array, DIMENSION ( m ).Workspace for VL.

REAL for slasd7alphaDOUBLE PRECISION for dlasd7.Contains the diagonal element associated with the addedrow.


INTEGER.idxWorkspace array, DIMENSION (n). This will contain thepermutation used to sort the contents of d into ascendingorder.

INTEGER.idxpWorkspace array, DIMENSION (n). This will contain thepermutation used to place deflated values of d at the endof the array.

INTEGER.idxqArray, DIMENSION (n).This contains the permutation which separately sorts thetwo sub-problems in d into ascending order. Note thatentries in the first half of this permutation must first bemoved one position backward; and entries in the secondhalf must first have nl+1 added to their values.

INTEGER.The leading dimension of the output array givcol,must be at least n.

ldgcol

INTEGER. The leading dimension of the output array givnum,must be at least n.

ldgnum

1460


Output Parameters

INTEGER. Contains the dimension of the non-deflated matrix,this is the order of the related secular equation.

k

1 ≤ k ≤ n.

On exit, d contains the trailing (n-k) updated singular values(those which were deflated) sorted into increasing order.

d

REAL for slasd7zDOUBLE PRECISION for dlasd7.Array, DIMENSION (m).On exit, Z contains the updating row vector in the secularequation.

On exit, vf contains the first components of all right singularvectors of the bidiagonal matrix.

vf

On exit, vl contains the last components of all right singularvectors of the bidiagonal matrix.

vl

REAL for slasd7dsigmaDOUBLE PRECISION for dlasd7.Array, DIMENSION (n). Contains a copy of the diagonalelements (k-1 singular values and one zero) in the secularequation.

On output, idxp(2: k) points to the nondeflated d-valuesand idxp( k+1:n) points to the deflated singular values.

idxp

INTEGER.permArray, DIMENSION (n).The permutations (from deflation and sorting) to be appliedto each singular block. Not referenced if icompq = 0.

INTEGER.givptrThe number of Givens rotations which took place in thissubproblem. Not referenced if icompq = 0.

INTEGER.givcolArray, DIMENSION ( ldgcol, 2 ). Each pair of numbersindicates a pair of columns to take place in a Givens rotation.Not referenced if icompq = 0.

REAL for slasd7givnumDOUBLE PRECISION for dlasd7.

1461


Array, DIMENSION ( ldgnum, 2 ). Each number indicates theC or S value to be used in the corresponding Givens rotation.Not referenced if icompq = 0.

REAL for slasd7.cDOUBLE PRECISION for dlasd7.c contains garbage if sqre =0 and the C-value of a Givensrotation related to the right null space ifsqre = 1.

REAL for slasd7.SDOUBLE PRECISION for dlasd7.s contains garbage if sqre =0 and the S-value of a Givensrotation related to the right null space ifsqre = 1.

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value.

?lasd8Finds the square roots of the roots of the secularequation, and stores, for each element in D, thedistance to its two nearest poles. Used by ?bdsdc.

Syntax

call slasd8( icompq, k, d, z, vf, vl, difl, difr, lddifr, dsigma, work, info)

call dlasd8( icompq, k, d, z, vf, vl, difl, difr, lddifr, dsigma, work, info)

Description

The routine ?lasd8 finds the square roots of the roots of the secular equation, as defined bythe values in dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for eachelement in d, the distance to its two nearest poles (elements in dsigma). It also updates thearrays vf and vl, the first and last components of all the right singular vectors of the originalbidiagonal matrix. ?lasd8 is called from ?lasd6.

1462

Input Parameters

INTEGER. Specifies whether singular vectors are to becomputed in factored form in the calling routine:

icompq

= 0: Compute singular values only.= 1: Compute singular vectors in factored form as well.


be solved by ?lasd4. k ≥ 1.

k

REAL for slasd8zDOUBLE PRECISION for dlasd8.Array, DIMENSION ( k ).The first k elements of this array contain the componentsof the deflation-adjusted updating row vector.

REAL for slasd8vfDOUBLE PRECISION for dlasd8.Array, DIMENSION ( k ).On entry, vf contains information passed through dbede8.

REAL for slasd8vlDOUBLE PRECISION for dlasd8.Array, DIMENSION ( k ). On entry, vl contains informationpassed through dbede8.

INTEGER. The leading dimension of the output array difr,must be at least k.

lddifr

REAL for slasd8dsigmaDOUBLE PRECISION for dlasd8.Array, DIMENSION ( k ).The first k elements of this array contain the old roots ofthe deflated updating problem. These are the poles of thesecular equation.

REAL for slasd8workDOUBLE PRECISION for dlasd8.Workspace array, DIMENSION at least (3k).

Output Parameters

REAL for slasd8dDOUBLE PRECISION for dlasd8.

1463


Array, DIMENSION ( k ).On output, D contains the updated singular values.

On exit, vf contains the first k components of the firstcomponents of all right singular vectors of the bidiagonalmatrix.

vf

On exit, vl contains the first k components of the lastcomponents of all right singular vectors of the bidiagonalmatrix.

vl

REAL for slasd8diflDOUBLE PRECISION for dlasd8.Array, DIMENSION ( k ). On exit, difl(i) = d(i) -dsigma(i).

REAL for slasd8difrDOUBLE PRECISION for dlasd8.Array,DIMENSION ( lddifr, 2 ) if icompq = 1 andDIMENSION ( k ) if icompq = 0.On exit, difr(i,1) = d(i) - dsigma(i+1), difr(k,1)is not defined and will not be referenced. If icompq = 1,difr(1:k,2) is an array containing the normalizing factorsfor the right singular vector matrix.

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value.> 0: If info = 1, an singular value did not converge.

?lasd9Finds the square roots of the roots of the secularequation, and stores, for each element in D, thedistance to its two nearest poles. Used by ?bdsdc.

Syntax

call slasd9( icompq, ldu, k, d, z, vf, vl, difl, difr, dsigma, work, info )

call dlasd9( icompq, ldu, k, d, z, vf, vl, difl, difr, dsigma, work, info )

1464

Description

The routine ?lasd9 finds the square roots of the roots of the secular equation, as defined bythe values in dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for eachelement in d, the distance to its two nearest poles (elements in dsigma). It also updates thearrays vf and vl, the first and last components of all the right singular vectors of the originalbidiagonal matrix. ?lasd9 is called from ?lasd7.

Input Parameters

INTEGER. Specifies whether singular vectors are to becomputed in factored form in the calling routine:

icompq

If icompq = 0, compute singular values only;If icompq = 1, compute singular vector matrices in factoredform also.


be solved by slasd4. k ≥ 1.

k

REAL for slasd9dsigmaDOUBLE PRECISION for dlasd9.Array, DIMENSION(k).The first k elements of this array contain the old roots ofthe deflated updating problem. These are the poles of thesecular equation.

REAL for slasd9zDOUBLE PRECISION for dlasd9.Array, DIMENSION (k). The first k elements of this arraycontain the components of the deflation-adjusted updatingrow vector.

REAL for slasd9vfDOUBLE PRECISION for dlasd9.Array, DIMENSION(k). On entry, vf contains informationpassed through sbede8.

REAL for slasd9vlDOUBLE PRECISION for dlasd9.Array, DIMENSION(k). On entry, vl contains informationpassed through sbede8.

REAL for slasd9workDOUBLE PRECISION for dlasd9.

1465


Workspace array, DIMENSION at least (3k).

Output Parameters

REAL for slasd9dDOUBLE PRECISION for dlasd9.Array, DIMENSION(k). d(i) contains the updated singularvalues.

On exit, vf contains the first k components of the firstcomponents of all right singular vectors of the bidiagonalmatrix.

vf

On exit, vl contains the first k components of the lastcomponents of all right singular vectors of the bidiagonalmatrix.

vl

REAL for slasd9diflDOUBLE PRECISION for dlasd9.Array, DIMENSION (k).On exit, difl(i) = d(i) - dsigma(i).

REAL for slasd9difrDOUBLE PRECISION for dlasd9.Array,DIMENSION (ldu, 2) if icompq =1 andDIMENSION (k) if icompq = 0.On exit, difr(i, 1) = d(i) - dsigma(i+1), difr(k, 1)is not defined and will not be referenced.If icompq = 1, difr(1:k, 2) is an array containing thenormalizing factors for the right singular vector matrix.

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value.> 0: If info = 1, an singular value did not converge

1466

?lasdaComputes the singular value decomposition (SVD)of a real upper bidiagonal matrix with diagonal dand off-diagonal e. Used by ?bdsdc.

Syntax

call slasda( icompq, smlsiz, n, sqre, d, e, u, ldu, vt, k, difl, difr, z,poles, givptr, givcol, ldgcol, perm, givnum, c, s, work, iwork, info )

call dlasda( icompq, smlsiz, n, sqre, d, e, u, ldu, vt, k, difl, difr, z,poles, givptr, givcol, ldgcol, perm, givnum, c, s, work, iwork, info )

Description

Using a divide and conquer approach, ?lasda computes the singular value decomposition (SVD)of a real upper bidiagonal n-by-m matrix B with diagonal d and off-diagonal e, where m = n +sqre.

The algorithm computes the singular values in the SVD B = U*S*VT. The orthogonal matricesU and VT are optionally computed in compact form. A related subroutine ?lasd0 computes thesingular values and the singular vectors in explicit form.

Input Parameters

INTEGER.icompqSpecifies whether singular vectors are to be computed incompact form, as follows:= 0: Compute singular values only.= 1: Compute singular vectors of upper bidiagonal matrixin compact form.

INTEGER.smlsizThe maximum size of the subproblems at the bottom of thecomputation tree.

INTEGER. The row dimension of the upper bidiagonal matrix.This is also the dimension of the main diagonal array d.

n

INTEGER. Specifies the column dimension of the bidiagonalmatrix.

sqre

If sqre = 0: the bidiagonal matrix has column dimensionm = n

1467


If sqre = 1: the bidiagonal matrix has column dimensionm = n + 1.

REAL for slasdadDOUBLE PRECISION for dlasda.Array, DIMENSION (n). On entry, d contains the maindiagonal of the bidiagonal matrix.

REAL for slasdaeDOUBLE PRECISION for dlasda.Array, DIMENSION ( m - 1 ). Contains the subdiagonal entriesof the bidiagonal matrix. On exit, e is destroyed.

INTEGER. The leading dimension of arrays u, vt, difl, difr,

poles, givnum, and z. ldu ≥ n.

ldu

INTEGER. The leading dimension of arrays givcol and perm.

ldgcol ≥ n.

ldgcol

REAL for slasdaworkDOUBLE PRECISION for dlasda.Workspace array, DIMENSION (6 * n + (smlsiz + 1)2).

INTEGER.iworkWorkspace array, Dimension must be at least (7 * n).

Output Parameters

On exit d, if info = 0, contains the singular values of thebidiagonal matrix.

d

REAL for slasdauDOUBLE PRECISION for dlasda.Array, DIMENSION (ldu, smlsiz) if icompq =1.Not referenced if icompq = 0.If icompq = 1, on exit, u contains the left singular vectormatrices of all subproblems at the bottom level.

REAL for slasdavtDOUBLE PRECISION for dlasda.Array, DIMENSION ( ldu, smlsiz+1 ) if icompq = 1, andnot referenced if icompq = 0. If icompq = 1, on exit, vt'contains the right singular vector matrices of all subproblemsat the bottom level.

1468


INTEGER.kArray, DIMENSION (n) if icompq = 1 andDIMENSION (1) if icompq = 0.If icompq = 1, on exit, k(i) is the dimension of the i-thsecular equation on the computation tree.

REAL for slasdadiflDOUBLE PRECISION for dlasda.Array, DIMENSION ( ldu, nlvl ),where nlvl = floor (log2 (n/smlsiz))).

REAL for slasdadifrDOUBLE PRECISION for dlasda.Array,DIMENSION ( ldu, 2 nlvl ) if icompq = 1 andDIMENSION (n) if icompq = 0.If icompq = 1, on exit, difl(1:n, i) and difr(1:n,2i -1)record distances between singular values on the i-th leveland singular values on the (i -1)-th level, and difr(1:n, 2i) contains the normalizing factors for the right singularvector matrix. See ?lasd8 for details.

REAL for slasdazDOUBLE PRECISION for dlasda.Array,DIMENSION ( ldu, nlvl ) if icompq = 1 andDIMENSION (n) if icompq = 0. The first k elements of z(1,i) contain the components of the deflation-adjusted updatingrow vector for subproblems on the i-th level.

REAL for slasdapolesDOUBLE PRECISION for dlasdaArray, DIMENSION ( ldu, 2*nlvl )if icompq = 1, and not referenced if icompq = 0. If icompq= 1, on exit, poles(1, 2i - 1) and poles(1, 2i) contain thenew and old singular values involved in the secular equationson the i-th level.

INTEGER. Array, DIMENSION (n) if icompq = 1, and notreferenced if icompq = 0. If icompq = 1, on exit, givptr(i ) records the number of Givens rotations performed onthe i-th problem on the computation tree.

givptr

1469


INTEGER .givcolArray, DIMENSION ( ldgcol, 2*nlvl ) if icompq = 1, andnot referenced if icompq = 0. If icompq = 1, on exit, foreach i, givcol(1, 2 i - 1) and givcol(1, 2 i) record thelocations of Givens rotations performed on the i-th level onthe computation tree.

INTEGER . Array, DIMENSION ( ldgcol, nlvl ) if icompq =1, and not referenced if icompq = 0. If icompq = 1, onexit, perm (1, i) records permutations done on the i-thlevel of the computation tree.

perm

REAL for slasdagivnumDOUBLE PRECISION for dlasda.Array DIMENSION ( ldu, 2*nlvl ) if icompq = 1, and notreferenced if icompq = 0. If icompq = 1, on exit, for eachi, givnum(1, 2 i - 1) and givnum(1, 2 i) record the C- andS-values of Givens rotations performed on the i-th level onthe computation tree.

REAL for slasdacDOUBLE PRECISION for dlasda.Array,DIMENSION (n) if icompq = 1, andDIMENSION (1) if icompq = 0.If icompq = 1 and the i-th subproblem is not square, onexit, c(i) contains the C-value of a Givens rotation relatedto the right null space of the i-th subproblem.

REAL for slasdasDOUBLE PRECISION for dlasda.Array,DIMENSION (n) icompq = 1, andDIMENSION (1) if icompq = 0.If icompq = 1 and the i-th subproblem is not square, onexit, s(i) contains the S-value of a Givens rotation relatedto the right null space of the i-th subproblem.

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value> 0: If info = 1, an singular value did not converge

1470

?lasdqComputes the SVD of a real bidiagonal matrix withdiagonal d and off-diagonal e. Used by ?bdsdc.

Syntax

call slasdq( uplo, sqre, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc,work, info )

call dlasdq( uplo, sqre, n, ncvt, nru, ncc, d, e, vt, ldvt, u, ldu, c, ldc,work, info )

Description

The routine ?lasdq computes the singular value decomposition (SVD) of a real (upper or lower)bidiagonal matrix with diagonal d and off-diagonal e, accumulating the transformations ifdesired. Letting B denote the input bidiagonal matrix, the algorithm computes orthogonalmatrices Q and P such that B = Q*S*P' (P' denotes the transpose of P). The singular values Sare overwritten on d.

The input matrix U is changed to U*Q if desired.

The input matrix VT is changed to P' *VT if desired.

The input matrix C is changed to Q'*C if desired.

Input Parameters

CHARACTER*1. On entry, uplo specifies whether the inputbidiagonal matrix is upper or lower bidiagonal.

uplo

If uplo = 'U' or 'u' , B is upper bidiagonal;If uplo = 'L' or 'l' , B is lower bidiagonal.

INTEGER.sqre= 0: then the input matrix is n-by-n.= 1: then the input matrix is n-by-(n+1) if uplu = 'U' and(n+1)-by-n if uplu= 'L'. The bidiagonal matrix has n = nl + nr + 1 rows

and m = n + sqre ≥ n columns.

INTEGER. On entry, n specifies the number of rows andcolumns in the matrix. n must be at least 0.

n

1471


INTEGER. On entry, ncvt specifies the number of columnsof the matrix VT. ncvt must be at least 0.

ncvt

INTEGER. On entry, nru specifies the number of rows of thematrix U. nru must be at least 0.

nru

INTEGER. On entry, ncc specifies the number of columnsof the matrix C. ncc must be at least 0.

ncc

REAL for slasdqdDOUBLE PRECISION for dlasdq.Array, DIMENSION (n). On entry, d contains the diagonalentries of the bidiagonal matrix whose SVD is desired.

REAL for slasdqeDOUBLE PRECISION for dlasdq.Array, DIMENSION is (n-1) if sqre = 0 and n if sqre = 1.On entry, the entries of e contain the off-diagonal entriesof the bidiagonal matrix whose SVD is desired.

REAL for slasdqvtDOUBLE PRECISION for dlasdq.Array, DIMENSION (ldvt, ncvt). On entry, contains a matrixwhich on exit has been premultiplied by P', dimensionn-by-ncvt if sqre = 0 and (n+1)-by-ncvt if sqre = 1 (notreferenced if ncvt=0).

INTEGER. On entry, ldvt specifies the leading dimensionof vt as declared in the calling (sub) program. ldvt mustbe at least 1. If ncvt is nonzero, ldvt must also be at leastn.

ldvt

REAL for slasdquDOUBLE PRECISION for dlasdq.Array, DIMENSION (ldu, n). On entry, contains a matrixwhich on exit has been postmultiplied by Q, dimensionnru-by-n if sqre = 0 and nru-by-(n+1) if sqre = 1 (notreferenced if nru=0).

INTEGER. On entry, ldu specifies the leading dimension ofu as declared in the calling (sub) program. ldu must be atleast max(1, nru ) .

ldu

REAL for slasdqcDOUBLE PRECISION for dlasdq.

1472


Array, DIMENSION (ldc, ncc). On entry, contains ann-by-ncc matrix which on exit has been premultiplied byQ', dimension n-by-ncc if sqre = 0 and (n+1)-by-ncc ifsqre = 1 (not referenced if ncc=0).

INTEGER. On entry, ldc specifies the leading dimension ofC as declared in the calling (sub) program. ldc must be atleast 1. If ncc is non-zero, ldc must also be at least n.

ldc

REAL for slasdqworkDOUBLE PRECISION for dlasdq.Array, DIMENSION (4n). This is a workspace array. Onlyreferenced if one of ncvt, nru, or ncc is nonzero, and if nis at least 2.

Output Parameters

On normal exit, d contains the singular values in ascendingorder.

d

On normal exit, e will contain 0. If the algorithm does notconverge, d and e will contain the diagonal andsuperdiagonal entries of a bidiagonal matrix orthogonallyequivalent to the one given as input.

e

On exit, the matrix has been premultiplied by P'.vt

On exit, the matrix has been postmultiplied by Q.u

On exit, the matrix has been premultiplied by Q'.c

INTEGER. On exit, a value of 0 indicates a successful exit.If info < 0, argument number -info is illegal. If info >0, the algorithm did not converge, and info specifies howmany superdiagonals did not converge.

info

1473

?lasdtCreates a tree of subproblems for bidiagonal divideand conquer. Used by ?bdsdc.

Syntax

call slasdt( n, lvl, nd, inode, ndiml, ndimr, msub )

call dlasdt( n, lvl, nd, inode, ndiml, ndimr, msub )

Description

The routine creates a tree of subproblems for bidiagonal divide and conquer.

Input Parameters

INTEGER. On entry, the number of diagonal elements of thebidiagonal matrix.

n

INTEGER. On entry, the maximum row dimension eachsubproblem at the bottom of the tree can be of.

msub

Output Parameters

INTEGER. On exit, the number of levels on the computationtree.

lvl

INTEGER. On exit, the number of nodes on the tree.nd

INTEGER.inodeArray, DIMENSION (n). On exit, centers of subproblems.

INTEGER .ndimlArray, DIMENSION (n). On exit, row dimensions of leftchildren.

INTEGER .ndimrArray, DIMENSION (n). On exit, row dimensions of rightchildren.

1474


?lasetInitializes the off-diagonal elements and thediagonal elements of a matrix to given values.

Syntax

call slaset( uplo, m, n, alpha, beta, a, lda )

call dlaset( uplo, m, n, alpha, beta, a, lda )

call claset( uplo, m, n, alpha, beta, a, lda )

call zlaset( uplo, m, n, alpha, beta, a, lda )

Description

The routine initializes an m-by-n matrix A to beta on the diagonal and alpha on the off-diagonals.

Input Parameters

CHARACTER*1. Specifies the part of the matrix A to be set.uploIf uplo = 'U', upper triangular part is set; the strictlylower triangular part of A is not changed.If uplo = 'L': lower triangular part is set; the strictlyupper triangular part of A is not changed.Otherwise: All of the matrix A is set.



n ≥ 0.

REAL for slasetalpha, betaDOUBLE PRECISION for dlasetCOMPLEX for clasetCOMPLEX*16 for zlaset.The constants to which the off-diagonal and diagonalelements are to be set, respectively.

REAL for slasetaDOUBLE PRECISION for dlasetCOMPLEX for clasetCOMPLEX*16 for zlaset.

1475


Array, DIMENSION (lda, n).On entry, the m-by-n matrix A.


lda ≥ max(1,m).

Output Parameters

On exit, the leading m-by-n submatrix of A is set as follows:a

if uplo = 'U', A(i,j) = alpha, 1≤ i ≤ j-1, 1≤ j ≤n,

if uplo = 'L', A(i,j) = alpha, j+1≤ i ≤ m, 1≤ j ≤n,

otherwise, A(i,j) = alpha, 1≤ i ≤ m, 1≤ j ≤ n, i ≠j,

and, for all uplo, A(i,i) = beta, 1≤ i ≤ min(m, n).

?lasq1Computes the singular values of a real squarebidiagonal matrix. Used by ?bdsqr.

Syntax

call slasq1( n, d, e, work, info )

call dlasq1( n, d, e, work, info )

Description

The routine ?lasq1 computes the singular values of a real n-by-n bidiagonal matrix with diagonald and off-diagonal e. The singular values are computed to high relative accuracy, in the absenceof denormalization, underflow and overflow.

Input Parameters

INTEGER.The number of rows and columns in the matrix. n

≥ 0.

n

REAL for slasq1dDOUBLE PRECISION for dlasq1.

1476


Array, DIMENSION (n).On entry, d contains the diagonal elements of the bidiagonalmatrix whose SVD is desired.

REAL for slasq1eDOUBLE PRECISION for dlasq1.Array, DIMENSION (n).On entry, elements e(1:n-1) contain the off-diagonalelements of the bidiagonal matrix whose SVD is desired.

REAL for slasq1workDOUBLE PRECISION for dlasq1.Workspace array, DIMENSION (4n).

Output Parameters

On normal exit, d contains the singular values in decreasingorder.

d

On exit, e is overwritten.e

INTEGER.info= 0: successful exit;< 0: if info = -i, the i-th argument had an illegal value;> 0: the algorithm failed:= 1, a split was marked by a positive value in e;= 2, current block of z not diagonalized after 30*n iterations(in inner while loop);= 3, termination criterion of outer while loop not met(program created more than n unreduced blocks.

?lasq2Computes all the eigenvalues of the symmetricpositive definite tridiagonal matrix associated withthe qd array z to high relative accuracy. Used by?bdsqr and ?stegr.

Syntax

call slasq2( n, z, info )

call dlasq2( n, z, info )

1477

Description

The routine ?lasq2 computes all the eigenvalues of the symmetric positive definite tridiagonalmatrix associated with the qd array z to high relative accuracy, in the absence ofdenormalization, underflow and overflow.

To see the relation of z to the tridiagonal matrix, let L be a unit lower bidiagonal matrix withsubdiagonals z(2,4,6,,..) and let U be an upper bidiagonal matrix with 1's above and diagonalz(1,3,5,,..). The tridiagonal is LU or, if you prefer, the symmetric tridiagonal to which it issimilar.

Input Parameters

INTEGER. The number of rows and columns in the matrix.

n ≥ 0.

n

REAL for slasq2zDOUBLE PRECISION for dlasq2.Array, DIMENSION (4 * n).On entry, z holds the qd array.

Output Parameters

On exit, entries 1 to n hold the eigenvalues in decreasingorder, z(2*n+1) holds the trace, and z(2*n+2) holds thesum of the eigenvalues. If n > 2, then z(2*n+3) holds theiteration count, z(2*n+4) holds ndivs/nin2, and z(2*n+5)holds the percentage of shifts that failed.

z

INTEGER.info= 0: successful exit;< 0: if the i-th argument is a scalar and had an illegal value,then info = -i, if the i-th argument is an array and thej-entry had an illegal value, then info = -(i*100+ j);> 0: the algorithm failed:= 1, a split was marked by a positive value in e;= 2, current block of z not diagonalized after 30*n iterations(in inner while loop);= 3, termination criterion of outer while loop not met(program created more than n unreduced blocks).

1478

Application Notes

The routine ?lasq2 defines a logical variable, ieee, which is .TRUE. on machines which followieee-754 floating-point standard in their handling of infinities and NaNs, and .FALSE. otherwise.This variable is passed to ?lazq3.

?lasq3Checks for deflation, computes a shift and callsdqds. Used by ?bdsqr.

Syntax

call slasq3( i0, n0, z, pp, dmin, sigma, desig, qmax, nfail, iter, ndiv, ieee)

call dlasq3( i0, n0, z, pp, dmin, sigma, desig, qmax, nfail, iter, ndiv, ieee)

Description

The routine ?lasq3 checks for deflation, computes a shift, and calls dqds. In case of failure, itchanges shifts, and tries again until output is positive.

Input Parameters

INTEGER. First index.i0

INTEGER. Last index.n0

REAL for slasq3zDOUBLE PRECISION for dlasq3.Array, DIMENSION (4n). z holds the qd array.

INTEGER. pp=0 for ping, pp=1 for pong.pp

REAL for slasq3desigDOUBLE PRECISION for dlasq3.Lower order part of sigma.

REAL for slasq3qmaxDOUBLE PRECISION for dlasq3.Maximum value of q.

LOGICAL.ieeeFlag for ieee or non-ieee arithmetic (passed to ?lasq5).

1479


Output Parameters

REAL for slasq3dminDOUBLE PRECISION for dlasq3.Minimum value of d.

REAL for slasq3sigmaDOUBLE PRECISION for dlasq3.Sum of shifts used in current segment.

Lower order part of sigma.desig

INTEGER. Number of times shift was too big.nfail

INTEGER. Number of iterations.iter

INTEGER. Number of divisions.ndiv

?lasq4Computes an approximation to the smallesteigenvalue using values of d from the previoustransform. Used by ?bdsqr.

Syntax

call slasq4( i0, n0, z, pp, n0in, dmin, dmin1, dmin2, dn, dn1, dn2, tau, ttype)

call dlasq4( i0, n0, z, pp, n0in, dmin, dmin1, dmin2, dn, dn1, dn2, tau, ttype)

Description

The routine computes an approximation tau to the smallest eigenvalue using values of d fromthe previous transform.

Input Parameters



REAL for slasq4zDOUBLE PRECISION for dlasq4.Array, DIMENSION (4n).

1480


Z holds the qd array.


INTEGER. The value of n0 at start of eigtest.n0in


REAL for slasq4dmin1DOUBLE PRECISION for dlasq4.Minimum value of d, excluding d(n0).

REAL for slasq4dmin2DOUBLE PRECISION for dlasq4.Minimum value of d, excluding d(n0)and d(n0-1).

REAL for slasq4dnDOUBLE PRECISION for dlasq4. Contains d(n).

REAL for slasq4dn1DOUBLE PRECISION for dlasq4. Contains d(n-1).

REAL for slasq4dn2DOUBLE PRECISION for dlasq4. Contains d(n-2).

Output Parameters

REAL for slasq4tauDOUBLE PRECISION for dlasq4.Shift.

INTEGER. Shift type.ttype

?lasq5Computes one dqds transform in ping-pong form.Used by ?bdsqr and ?stegr.

Syntax

call slasq5( i0, n0, z, pp, tau, dmin, dmin1, dmin2, dn, dnm1, dnm2, ieee )

call dlasq5( i0, n0, z, pp, tau, dmin, dmin1, dmin2, dn, dnm1, dnm2, ieee )

1481


Description

The routine computes one dqds transform in ping-pong form: one version for ieee machines,another for non-ieee machines.

Input Parameters



REAL for slasq5zDOUBLE PRECISION for dlasq5.Array, DIMENSION (4n). z holds the qd array. emin is storedin z(4*n0)to avoid an extra argument.


REAL for slasq5tauDOUBLE PRECISION for dlasq5.This is the shift.

LOGICAL. Flag for IEEE or non-IEEE arithmetic.ieee

Output Parameters



REAL for slasq5dmin2DOUBLE PRECISION for dlasq5.Minimum value of d, excluding d(n0) and d(n0-1).

REAL for slasq5dnDOUBLE PRECISION for dlasq5. Contains d(n0), the lastvalue of d.

REAL for slasq5dnm1DOUBLE PRECISION for dlasq5. Contains d(n0-1).

REAL for slasq5dnm2

1482


DOUBLE PRECISION for dlasq5. Contains d(n0-2).

?lasq6Computes one dqd transform in ping-pong form.Used by ?bdsqr and ?stegr.

Syntax

call slasq6( i0, n0, z, pp, dmin, dmin1, dmin2, dn, dnm1, dnm2 )

call dlasq6( i0, n0, z, pp, dmin, dmin1, dmin2, dn, dnm1, dnm2 )

Description

The routine ?lasq6 computes one dqd (shift equal to zero) transform in ping-pong form, withprotection against underflow and overflow.

Input Parameters



REAL for slasq6zDOUBLE PRECISION for dlasq6.Array, DIMENSION (4n). Z holds the qd array. emin is storedin z(4*n0) to avoid an extra argument.


Output Parameters



REAL for slasq6dmin2DOUBLE PRECISION for dlasq6.Minimum value of d, excluding d(n0) and d(n0-1).

REAL for slasq6dn

1483


DOUBLE PRECISION for dlasq6. Contains d(n0), the lastvalue of d.



?lasrApplies a sequence of plane rotations to a generalrectangular matrix.

Syntax

call slasr( side, pivot, direct, m, n, c, s, a, lda )

call dlasr( side, pivot, direct, m, n, c, s, a, lda )

call clasr( side, pivot, direct, m, n, c, s, a, lda )

call zlasr( side, pivot, direct, m, n, c, s, a, lda )

Description

The routine applies a sequence of plane rotations to a real/complex matrix A, from the left orthe right.

A := P*A, when side = 'L' ( Left-hand side )

A := A*P', when side = 'R' ( Right-hand side )

where P is an orthogonal matrix consisting of a sequence of plane rotations with z = m whenside = 'L' and z = n when side = 'R'.

When direct = 'F' (Forward sequence), then

P = P( z - 1 ) ... P( 2 ) P( 1 ),

and when direct = 'B' (Backward sequence), then

P = P( 1 ) P( 2 ) ... P( z - 1 ),

where P( k ) is a plane rotation matrix defined by the 2-by-2 plane rotation:

1484


When pivot = 'V' ( Variable pivot ), the rotation is performed for the plane (k, k + 1), thatis, P(k) has the form

where R(k) appears as a rank-2 modification to the identity matrix in rows and columns k andk+1.

When pivot = 'T' ( Top pivot ), the rotation is performed for the plane (1,k+1), so P(k)has the form

1485


where R(k) appears in rows and columns k and k+1.

Similarly, when pivot = 'B' ( Bottom pivot ), the rotation is performed for the plane (k,z),giving P(k) the form

where R(k) appears in rows and columns k and z. The rotations are performed without everforming P(k) explicitly.

Input Parameters

CHARACTER*1. Specifies whether the plane rotation matrixP is applied to A on the left or the right.

side

1486


= 'L': left, compute A := P*A= 'R': right, compute A:= A*P'

CHARACTER*1. Specifies whether P is a forward or backwardsequence of plane rotations.

direct

= 'F': forward, P = P( z - 1 )* ... * P( 2 ) * P(1 )= 'B': backward, P = P( 1 )* P( 2 )* ... * P( z -1 )

CHARACTER*1. Specifies the plane for which P(k) is a planerotation matrix.

pivot

= 'V': Variable pivot, the plane (k, k+1)= 'T': Top pivot, the plane (1, k+1)= 'B': Bottom pivot, the plane (k, z)


If m ≤ 1, an immediate return is effected.


If n ≤ 1, an immediate return is effected.

REAL for slasr/clasrc, sDOUBLE PRECISION for dlasr/zlasr.Arrays, DIMENSION(m-1) if side = 'L',(n-1) if side = 'R' .c(k) and s(k) contain the cosine and sine of the planerotations respectively that define the 2-by-2 plane rotationpart (R(k)) of the P(k) matrix as described above inDescription.

REAL for slasraDOUBLE PRECISION for dlasrCOMPLEX for clasrCOMPLEX*16 for zlasr.Array, DIMENSION (lda, n).The m-by-n matrix A.


lda ≥ max(1,m).

1487


Output Parameters

On exit, A is overwritten by P*A if side = 'R', or by A*P'if side = 'L'.

a

?lasrtSorts numbers in increasing or decreasing order.

Syntax

call slasrt( id, n, d, info )

call dlasrt( id, n, d, info )

Description

The routine ?lasrt sorts the numbers in d in increasing order (if id = 'I') or in decreasing

order (if id = 'D'). It uses Quick Sort, reverting to Insertion Sort on arrays of size ≤ 20.Dimension of stack limits n to about 232.

Input Parameters

CHARACTER*1.id= 'I': sort d in increasing order;= 'D': sort d in decreasing order.

INTEGER. The length of the array d.n

REAL for slasrtdDOUBLE PRECISION for dlasrt.On entry, the array to be sorted.

Output Parameters

On exit, d has been sorted into increasing orderd

(d(1) ≤ ... ≤ d(n)) or into decreasing order

(d(1) ≥ ... ≥ d(n)), depending on id.

INTEGER.info= 0: successful exit< 0: if info = -i, the i-th argument had an illegal value.

1488

?lassqUpdates a sum of squares represented in scaledform.

Syntax

call slassq( n, x, incx, scale, sumsq )

call dlassq( n, x, incx, scale, sumsq )

call classq( n, x, incx, scale, sumsq )

call zlassq( n, x, incx, scale, sumsq )

Description

The real routines slassq/dlassq return the values scl and smsq such that

scl2 * smsq = x(1)2 +...+ x(n)2 + scale2 *sumsq,

where x( i ) = x(1 + ( i - 1) incx).

The value of sumsq is assumed to be non-negative and scl returns the value

scl = max( scale, abs(x(i))).

Values scale and sumsq must be supplied in scale and sumsq, and scl and smsq are overwrittenon scale and sumsq, respectively.

The complex routines classq/zlassq return the values scl and ssq such that

scl2 * ssq = x(1)2 +...+ x(n)2 + scale2 *sumsq,

where x( i ) = abs (x(1 + ( i - 1) incx)).

The value of sumsq is assumed to be at least unity and the value of ssq will then satisfy 1.0

≤ ssq ≤ sumsq + 2n

scale is assumed to be non-negative and scl returns the value

scl = max( scale, abs(real(x(i))), abs(aimag(x(i)))).

Values scale and sumsq must be supplied in scale and sumsq, and scl and ssq are overwrittenon scale and sumsq, respectively.

All routines ?lassq make only one pass through the vector x.

1489


Input Parameters

INTEGER. The number of elements to be used from thevector x.

n

REAL for slassqxDOUBLE PRECISION for dlassqCOMPLEX for classqCOMPLEX*16 for zlassq.The vector for which a scaled sum of squares is computed:

x( i ) = x(1 + (i - 1) incx ), 1 ≤ i ≤ n.

INTEGER. The increment between successive values of thevector x. incx > 0.

incx

REAL for slassq/classqscaleDOUBLE PRECISION for dlassq/zlassq.On entry, the value scale in the equation above.

REAL for slassq/classqsumsqDOUBLE PRECISION for dlassq/zlassq.On entry, the value sumsq in the equation above.

Output Parameters

On exit, scale is overwritten with scl, the scaling factorfor the sum of squares.

scale

For real flavors:sumsqOn exit, sumsq is overwritten with the value smsq in theequation above.For complex flavors:On exit, sumsq is overwritten with the value ssq in theequation above.

1490


?lasv2Computes the singular value decomposition of a2-by-2 triangular matrix.

Syntax

call slasv2( f, g, h, ssmin, ssmax, snr, csr, snl, csl )

call dlasv2( f, g, h, ssmin, ssmax, snr, csr, snl, csl )

Description

The routine ?lasv2 computes the singular value decomposition of a 2-by-2 triangular matrix

On return, abs(ssmax) is the larger singular value, abs(ssmin) is the smaller singular value,and (csl,snl) and (csr,snr) are the left and right singular vectors for abs(ssmax), giving thedecomposition

Input Parameters

REAL for slasv2f, g, hDOUBLE PRECISION for dlasv2.The (1,1), (1,2) and (2,2) elements of the 2-by-2 matrix,respectively.

Output Parameters

REAL for slasv2ssmin, ssmax

1491


DOUBLE PRECISION for dlasv2.abs(ssmin) and abs(ssmax) is the smaller and the largersingular value, respectively.

REAL for slasv2snl, cslDOUBLE PRECISION for dlasv2.The vector (csl, snl) is a unit left singular vector for thesingular value abs(ssmax).

REAL for slasv2snr, csrDOUBLE PRECISION for dlasv2.The vector (csr, snr) is a unit right singular vector for thesingular value abs(ssmax).

Application Notes

Any input parameter may be aliased with any output parameter.

Barring over/underflow and assuming a guard digit in subtraction, all output quantities arecorrect to within a few units in the last place (ulps).

In ieee arithmetic, the code works correctly if one matrix element is infinite. Overflow will notoccur unless the largest singular value itself overflows or is within a few ulps of overflow. (Onmachines with partial overflow, like the Cray, overflow may occur if the largest singular valueis within a factor of 2 of overflow.) Underflow is harmless if underflow is gradual. Otherwise,results may correspond to a matrix modified by perturbations of size near the underflowthreshold.

?laswpPerforms a series of row interchanges on a generalrectangular matrix.

Syntax

call slaswp( n, a, lda, k1, k2, ipiv, incx )

call dlaswp( n, a, lda, k1, k2, ipiv, incx )

call claswp( n, a, lda, k1, k2, ipiv, incx )

call zlaswp( n, a, lda, k1, k2, ipiv, incx )

1492


Description

The routine performs a series of row interchanges on the matrix A. One row interchange isinitiated for each of rows k1 through k2 of A.

Input Parameters


REAL for slaswpaDOUBLE PRECISION for dlaswpCOMPLEX for claswpCOMPLEX*16 for zlaswp.Array, DIMENSION (lda, n).On entry, the matrix of column dimension n to which therow interchanges will be applied.


INTEGER. The first element of ipiv for which a rowinterchange will be done.

k1

INTEGER. The last element of ipiv for which a rowinterchange will be done.

k2

INTEGER.ipivArray, DIMENSION (k2 * abs(incx)).The vector of pivot indices. Only the elements in positionsk1 through k2 of ipiv are accessed.ipiv(k) = l implies rows k and l are to be interchanged.

INTEGER. The increment between successive values of ipiv.If ipiv is negative, the pivots are applied in reverse order.

incx

Output Parameters

On exit, the permuted matrix.a

1493


?lasy2Solves the Sylvester matrix equation where thematrices are of order 1 or 2.

Syntax

call slasy2( ltranl, ltranr, isgn, n1, n2, tl, ldtl, tr, ldtr, b, ldb, scale,x, ldx, xnorm, info )

call dlasy2( ltranl, ltranr, isgn, n1, n2, tl, ldtl, tr, ldtr, b, ldb, scale,x, ldx, xnorm, info )

Description

The routine solves for the n1-by-n2 matrix X, 1 ≤ n1, n2 ≤ 2, in

op(TL) * X + isgn * X *op(TR) = scale*B,

where

TL is n1-by-n1,

TR is n2-by-n2,

B is n1-by-n2,

and isgn = 1 or -1. op(T) = T or T', where T' denotes the transpose of T.

Input Parameters

LOGICAL.ltranlOn entry, ltranl specifies the op(TL):= .FALSE., op(TL) = TL,= .TRUE., op(TL) = TL'.

LOGICAL.ltranrOn entry, ltranr specifies the op(TR):= .FALSE., op(TR) = TR,= .TRUE., op(TR) = TR'.

INTEGER. On entry, isgn specifies the sign of the equationas described before. isgn may only be 1 or -1.

isgn

INTEGER. On entry, n1 specifies the order of matrix TL.n1n1 may only be 0, 1 or 2.

1494


INTEGER. On entry, n2 specifies the order of matrix TR.n2n2 may only be 0, 1 or 2.

REAL for slasy2tlDOUBLE PRECISION for dlasy2.Array, DIMENSION (ldtl,2).On entry, tl contains an n1-by-n1 matrix TL.

INTEGER.The leading dimension of the matrix TL.ldtl

ldtl ≥ max(1,n1).

REAL for slasy2trDOUBLE PRECISION for dlasy2.Array, DIMENSION (ldtr,2). On entry, tr contains ann2-by-n2 matrix TR.

INTEGER. The leading dimension of the matrix tr.ldtr

ldtr ≥ max(1,n2).

REAL for slasy2bDOUBLE PRECISION for dlasy2.Array, DIMENSION (ldb,2). On entry, the n1-by-n2 matrixb contains the right-hand side of the equation.

INTEGER. The leading dimension of the matrix b.ldb

ldb ≥ max(1,n1).

INTEGER. The leading dimension of the output matrix x.ldx

ldx ≥ max(1,n1).

Output Parameters

REAL for slasy2scaleDOUBLE PRECISION for dlasy2.On exit, scale contains the scale factor.scale is chosen less than or equal to 1 to prevent thesolution overflowing.

REAL for slasy2xDOUBLE PRECISION for dlasy2.Array, DIMENSION (ldx,2). On exit, x contains the n1-by-n2solution.

REAL for slasy2xnorm

1495


DOUBLE PRECISION for dlasy2.On exit, xnorm is the infinity-norm of the solution.

INTEGER. On exit, info is set to 0: successful exit. 1: TLand TR have too close eigenvalues, so TL or TR is perturbedto get a nonsingular equation.

info


?lasyfComputes a partial factorization of a real/complexsymmetric matrix, using the diagonal pivotingmethod.

Syntax

call slasyf( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

call dlasyf( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

call clasyf( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

call zlasyf( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

Description

The routine ?lasyf computes a partial factorization of a real/complex symmetric matrix A usingthe Bunch-Kaufman diagonal pivoting method. The partial factorization has the form:

1496


where the order of D is at most nb.

The actual order is returned in the argument kb, and is either nb or nb-1, or n if n ≤ nb.

This is an auxiliary routine called by ?sytrf. It uses blocked code (calling Level 3 BLAS) toupdate the submatrix A11 (if uplo = 'U') or A22 (if uplo = 'L').

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is stored:= 'U': Upper triangular= 'L': Lower triangular


INTEGER. The maximum number of columns of the matrixA that should be factored. nb should be at least 2 to allowfor 2-by-2 pivot blocks.

nb

REAL for slasyfaDOUBLE PRECISION for dlasyfCOMPLEX for clasyfCOMPLEX*16 for zlasyf.Array, DIMENSION (lda, n). On entry, the symmetric matrixA. If uplo = 'U', the leading n-by-n upper triangular partof A contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced. Ifuplo = 'L', the leading n-by-n lower triangular part of Acontains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.

INTEGER. The leading dimension of the array a. lda ≥max(1,n).

lda

REAL for slasyfwDOUBLE PRECISION for dlasyfCOMPLEX for clasyfCOMPLEX*16 for zlasyf.Workspace array, DIMENSION (ldw, nb).

1497


INTEGER. The leading dimension of the array w. ldw ≥max(1,n).

ldw

Output Parameters

INTEGER. The number of columns of A that were actually

factored kb is either nb-1 or nb, or n if n ≤ nb.

kb

On exit, A contains details of the partial factorization.a

INTEGER. Array, DIMENSION (n ). Details of the interchangesand the block structure of D.

ipiv

If uplo = 'U', only the last kb elements of ipiv are set;if uplo = 'L', only the first kb elements are set.If ipiv(k) > 0, then rows and columns k and ipiv(k) wereinterchanged and D(k, k) is a 1-by-1 diagonal block.If uplo = 'U' and ipiv(k) = ipiv(k-1) < 0, then rowsand columns k-1 and -ipiv(k) were interchanged andD(k-1:k, k-1:k) is a 2-by-2 diagonal block.If uplo = 'L' and ipiv(k) = ipiv(k+1) < 0, then rowsand columns k+1 and -ipiv(k) were interchanged andD(k:k+1, k:k+1) is a 2-by-2 diagonal block.

INTEGER.info= 0: successful exit> 0: if info = k, D( k, k) is exactly zero. Thefactorization has been completed, but the block diagonalmatrix D is exactly singular.

?lahefComputes a partial factorization of a complexHermitian indefinite matrix, using the diagonalpivoting method.

Syntax

call clahef( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

call zlahef( uplo, n, nb, kb, a, lda, ipiv, w, ldw, info )

1498

Description

The routine ?lahef computes a partial factorization of a complex Hermitian matrix A, usingthe Bunch-Kaufman diagonal pivoting method. The partial factorization has the form:

where the order of D is at most nb.

The actual order is returned in the argument kb, and is either nb or nb-1, or n if n ≤ nb.

Note that U' denotes the conjugate transpose of U.

This is an auxiliary routine called by ?hetrf. It uses blocked code (calling Level 3 BLAS) toupdate the submatrix A11 (if uplo = 'U') or A22 (if uplo = 'L').

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theHermitian matrix A is stored:= 'U': upper triangular= 'L': lower triangular


INTEGER. The maximum number of columns of the matrixA that should be factored. nb should be at least 2 to allowfor 2-by-2 pivot blocks.

nb

COMPLEX for clahefaCOMPLEX*16 for zlahef.

1499


Array, DIMENSION (lda, n).On entry, the Hermitian matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.


lda

COMPLEX for clahefwCOMPLEX*16 for zlahef.Workspace array, DIMENSION (ldw, nb).

INTEGER. The leading dimension of the array w. ldw ≥max(1,n).

ldw

Output Parameters

INTEGER. The number of columns of A that were actually

factored kb is either nb-1 or nb, or n if n ≤ nb.

kb

On exit, A contains details of the partial factorization.a

INTEGER.ipivArray, DIMENSION (n ). Details of the interchanges and theblock structure of D.If uplo = 'U', only the last kb elements of ipiv are set;if uplo = 'L', only the first kb elements are set.If ipiv(k) > 0, then rows and columns k and ipiv(k) areinterchanged and D(k, k) is a 1-by-1 diagonal block.If uplo = 'U' and ipiv(k) = ipiv(k-1) < 0, then rowsand columns k-1 and -ipiv(k) are interchanged andD(k-1:k, k-1:k) is a 2-by-2 diagonal block.If uplo = 'L' and ipiv(k) = ipiv(k+1) < 0, then rowsand columns k+1 and -ipiv(k) are interchanged and D(k:k+1, k:k+1) is a 2-by-2 diagonal block.

INTEGER.info= 0: successful exit

1500

> 0: if info = k, D(k, k) is exactly zero. The factorizationhas been completed, but the block diagonal matrix D isexactly singular.

?latbsSolves a triangular banded system of equations.

Syntax

call slatbs( uplo, trans, diag, normin, n, kd, ab, ldab, x, scale, cnorm,info )

call dlatbs( uplo, trans, diag, normin, n, kd, ab, ldab, x, scale, cnorm,info )

call clatbs( uplo, trans, diag, normin, n, kd, ab, ldab, x, scale, cnorm,info )

call zlatbs( uplo, trans, diag, normin, n, kd, ab, ldab, x, scale, cnorm,info )

Description

The routine solves one of the triangular systems

A*x = s*b, or AT*x = s*b, or AH*x = s*b (for complex flavors)

with scaling to prevent overflow, where A is an upper or lower triangular band matrix. Here AT

denotes the transpose of A, AH denotes the conjugate transpose of A, x and b are n-elementvectors, and s is a scaling factor, usually less than or equal to 1, chosen so that the componentsof x will be less than the overflow threshold. If the unscaled problem will not cause overflow,the Level 2 BLAS routine ?tbsv is called. If the matrix A is singular (A(j, j) = 0 for somej), then s is set to 0 and a non-trivial solution to A*x = 0 is returned.

Input Parameters

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': upper triangular= 'L': lower triangular

CHARACTER*1.transSpecifies the operation applied to A.

1501


= 'N': solve A*x = s*b (no transpose)= 'T': solve AT*x = s*b (transpose)= 'C': solve AH*x = s*b (conjugate transpose)

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular= 'N': non-unit triangular= 'U': unit triangular

CHARACTER*1.norminSpecifies whether cnorm has been set or not.= 'Y': cnorm contains the column norms on entry;= 'N': cnorm is not set on entry. On exit, the norms willbe computed and stored in cnorm.


INTEGER. The number of subdiagonals or superdiagonals in

the triangular matrix A. kb ≥ 0.

kd

REAL for slatbsabDOUBLE PRECISION for dlatbsCOMPLEX for clatbsCOMPLEX*16 for zlatbs.Array, DIMENSION (ldab, n).The upper or lower triangular band matrix A, stored in thefirst kb+1 rows of the array. The j-th column of A is storedin the j-th column of the array ab as follows:if uplo = 'U', ab(kd+1+i -j,j) = A(i ,j) for max(1,

j-kd) ≤ i ≤ j;

if uplo = 'L', ab(1+i -j,j) = A(i ,j) for j ≤ i ≤min(n, j+kd).

INTEGER. The leading dimension of the array ab. ldab ≥kb+1.

ldab

REAL for slatbsxDOUBLE PRECISION for dlatbsCOMPLEX for clatbsCOMPLEX*16 for zlatbs.Array, DIMENSION (n).On entry, the right hand side b of the triangular system.

1502


REAL for slatbs/clatbscnormDOUBLE PRECISION for dlatbs/zlatbs.Array, DIMENSION (n).If NORMIN = 'Y', cnorm is an input argument and cnorm(j)contains the norm of the off-diagonal part of the j-th columnof A.If trans = 'N', cnorm(j) must be greater than or equalto the infinity-norm, and if trans = 'T' or 'C' , cnorm(j)must be greater than or equal to the 1-norm.

Output Parameters

REAL for slatbs/clatbsscaleDOUBLE PRECISION for dlatbs/zlatbs.The scaling factor s for the triangular system as describedabove. If scale = 0, the matrix A is singular or badlyscaled, and the vector x is an exact or approximate solutionto Ax = 0.

If normin = 'N', cnorm is an output argument andcnorm(j) returns the 1-norm of the off-diagonal part of thej-th column of A.

cnorm

INTEGER.info= 0: successful exit< 0: if info = -k, the k-th argument had an illegal value

?latdfUses the LU factorization of the n-by-n matrixcomputed by ?getc2 and computes a contributionto the reciprocal Dif-estimate.

Syntax

call slatdf( ijob, n, z, ldz, rhs, rdsum, rdscal, ipiv, jpiv )

call dlatdf( ijob, n, z, ldz, rhs, rdsum, rdscal, ipiv, jpiv )

call clatdf( ijob, n, z, ldz, rhs, rdsum, rdscal, ipiv, jpiv )

call zlatdf( ijob, n, z, ldz, rhs, rdsum, rdscal, ipiv, jpiv )

1503

Description

The routine ?latdf uses the LU factorization of the n-by-n matrix Z computed by ?getc2 andcomputes a contribution to the reciprocal Dif-estimate by solving Z*x = b for x, and choosingthe right-hand side b such that the norm of x is as large as possible. On entry rhs = b holdsthe contribution from earlier solved sub-systems, and on return rhs = x.

The factorization of Z returned by ?getc2 has the form Z = P*L*U*Q, where p and Q arepermutation matrices. L is lower triangular with unit diagonal elements and U is upper triangular.

Input Parameters

INTEGER.ijobijob = 2: First compute an approximative null-vector e ofZ using ?gecon, e is normalized, and solve forZx = ±e -f with the sign giving the greater value of2-norm(x). This option is about 5 times as expensive asdefault.

ijob ≠ 2 (default): Local look ahead strategy where allentries of the right-hand side b is chosen as either +1 or -1.

INTEGER. The number of columns of the matrix Z.n

REAL for slatdf/clatdfzDOUBLE PRECISION for dlatdf/zlatdf.Array, DIMENSION (ldz, n)On entry, the LU part of the factorization of the n-by-nmatrix Z computed by ?getc2: Z = P*L*U*Q.

INTEGER. The leading dimension of the array Z. lda ≥max(1, n).

ldz

REAL for slatdf/clatdfrhsDOUBLE PRECISION for dlatdf/zlatdf.Array, DIMENSION (n).On entry, rhs contains contributions from other subsystems.

REAL for slatdf/clatdfrdsumDOUBLE PRECISION for dlatdf/zlatdf.

1504


On entry, the sum of squares of computed contributions tothe Dif-estimate under computation by ?tgsyL, where thescaling factor rdscal has been factored out. If trans ='T', rdsum is not touched.Note that rdsum only makes sense when ?tgsy2 is calledby ?tgsyL.

REAL for slatdf/clatdfrdscalDOUBLE PRECISION for dlatdf/zlatdf.On entry, scaling factor used to prevent overflow in rdsum.If trans = T', rdscal is not touched.Note that rdscal only makes sense when ?tgsy2 is calledby ?tgsyL.

INTEGER.ipivArray, DIMENSION (n).

The pivot indices; for 1 ≤ i ≤ n, row i of the matrix hasbeen interchanged with row ipiv(i).

INTEGER.jpivArray, DIMENSION (n).

The pivot indices; for 1 ≤ j ≤ n, column j of the matrixhas been interchanged with column jpiv(j).

Output Parameters

On exit, rhs contains the solution of the subsystem withentries according to the value of ijob.

rhs

On exit, the corresponding sum of squares updated withthe contributions from the current sub-system.

rdsum

If trans = 'T', rdsum is not touched.

On exit, rdscal is updated with respect to the currentcontributions in rdsum.

rdscal

If trans = 'T', rdscal is not touched.

1505


?latpsSolves a triangular system of equations with thematrix held in packed storage.

Syntax

call slatps( uplo, trans, diag, normin, n, ap, x, scale, cnorm, info )

call dlatps( uplo, trans, diag, normin, n, ap, x, scale, cnorm, info )

call clatps( uplo, trans, diag, normin, n, ap, x, scale, cnorm, info )

call zlatps( uplo, trans, diag, normin, n, ap, x, scale, cnorm, info )

Description

The routine ?latps solves one of the triangular systems


with scaling to prevent overflow, where A is an upper or lower triangular matrix stored in packedform. Here AT denotes the transpose of A, AH denotes the conjugate transpose of A, x and b aren-element vectors, and s is a scaling factor, usually less than or equal to 1, chosen so that thecomponents of x will be less than the overflow threshold. If the unscaled problem does notcause overflow, the Level 2 BLAS routine ?tpsv is called. If the matrix A is singular (A(j, j)= 0 for some j), then s is set to 0 and a non-trivial solution to Ax = 0 is returned.

Input Parameters

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': upper triangular= 'L': uower triangular

CHARACTER*1.transSpecifies the operation applied to A.= 'N': solve A*x = s*b (no transpose)= 'T': solve AT*x = s*b (transpose)= 'C': solve AH*x = s*b (conjugate transpose)

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': non-unit triangular

1506


= 'U': unit triangular



REAL for slatpsapDOUBLE PRECISION for dlatpsCOMPLEX for clatpsCOMPLEX*16 for zlatps.Array, DIMENSION (n(n+1)/2).The upper or lower triangular matrix A, packed columnwisein a linear array. The j-th column of A is stored in the arrayap as follows:

if uplo = 'U', ap(i + (j-1)j/2) = A(i,j) for 1≤ i ≤j;if uplo = 'L', ap(i + (j-1)(2n-j)/2) = A(i, j) for

j≤i≤n.

REAL for slatps DOUBLE PRECISION for dlatpsxCOMPLEX for clatpsCOMPLEX*16 for zlatps.Array, DIMENSION (n)On entry, the right hand side b of the triangular system.

REAL for slatps/clatpscnormDOUBLE PRECISION for dlatps/zlatps.Array, DIMENSION (n).If normin = 'Y', cnorm is an input argument and cnorm(j)contains the norm of the off-diagonal part of the j-th columnof A.If trans = 'N', cnorm(j) must be greater than or equalto the infinity-norm, and if trans = 'T' or 'C' , cnorm(j)must be greater than or equal to the 1-norm.

1507


Output Parameters

On exit, x is overwritten by the solution vector x.x

REAL for slatps/clatpsscaleDOUBLE PRECISION for dlatps/zlatps.The scaling factor s for the triangular system as describedabove.If scale = 0, the matrix A is singular or badly scaled, andthe vector x is an exact or approximate solution to A*x =0.


cnorm


?latrdReduces the first nb rows and columns of asymmetric/Hermitian matrix A to real tridiagonalform by an orthogonal/unitary similaritytransformation.

Syntax

call slatrd( uplo, n, nb, a, lda, e, tau, w, ldw )

call dlatrd( uplo, n, nb, a, lda, e, tau, w, ldw )

call clatrd( uplo, n, nb, a, lda, e, tau, w, ldw )

call zlatrd( uplo, n, nb, a, lda, e, tau, w, ldw )

Description

The routine ?latrd reduces nb rows and columns of a real symmetric or complex Hermitianmatrix A to symmetric/Hermitian tridiagonal form by an orthogonal/unitary similaritytransformation Q' A Q, and returns the matrices V and W which are needed to apply thetransformation to the unreduced part of A.

1508


If uplo = 'U', ?latrd reduces the last nb rows and columns of a matrix, of which the uppertriangle is supplied;

if uplo = 'L', ?latrd reduces the first nb rows and columns of a matrix, of which the lowertriangle is supplied.

This is an auxiliary routine called by ?sytrd/?hetrd.

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix A is stored:= 'U': upper triangular= 'L': lower triangular


INTEGER. The number of rows and columns to be reduced.nb

REAL for slatrdaDOUBLE PRECISION for dlatrdCOMPLEX for clatrdCOMPLEX*16 for zlatrd.Array, DIMENSION (lda, n).On entry, the symmetric/Hermitian matrix AIf uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.

INTEGER. The leading dimension of the array a. lda ≥(1,n).

lda

INTEGER.LDW

The leading dimension of the output array w. ldw ≥max(1,n).

1509


Output Parameters

On exit, if uplo = 'U', the last nb columns have beenreduced to tridiagonal form, with the diagonal elementsoverwriting the diagonal elements of A; the elements above

a

the diagonal with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors;if uplo = 'L', the first nb columns have been reduced totridiagonal form, with the diagonal elements overwriting thediagonal elements of a; the elements below the diagonalwith the array tau, represent the orthogonal/unitary matrixQ as a product of elementary reflectors.

REAL for slatrd/clatrdeDOUBLE PRECISION for dlatrd/zlatrd.If uplo = 'U', e(n-nb:n-1) contains the superdiagonalelements of the last nb columns of the reduced matrix;if uplo = 'L', e(1:nb) contains the subdiagonal elementsof the first nb columns of the reduced matrix.

REAL for slatrdtauDOUBLE PRECISION for dlatrdCOMPLEX for clatrdCOMPLEX*16 for zlatrd.Array, DIMENSION (lda, n).The scalar factors of the elementary reflectors, stored intau(n-nb:n-1) if uplo = 'U', and in tau(1:nb) if uplo= 'L'.

REAL for slatrdwDOUBLE PRECISION for dlatrdCOMPLEX for clatrdCOMPLEX*16 for zlatrd.Array, DIMENSION (lda, n).The n-by-nb matrix W required to update the unreduced partof A.

Application Notes

If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

1510


Q = H(n ) H(n-1) . . . H(n-nb+1)


H(i) = I - tau*v*v'

where tau is a real/complex scalar, and v is a real/complex vector with v(i:n) = 0 and v(i-1)= 1; v(1: i-1) is stored on exit in a(1: i-1, i), and tau in tau(i-1).

If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(nb)

Each H(i) has the form H(i) = I - tau*v*v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1: i) = 0 andv(i+1) = 1; v( i+1:n) is stored on exit in a(i+1:n, i), and tau in tau(i).

The elements of the vectors v together form the n-by-nb matrix V which is needed, with W, toapply the transformation to the unreduced part of the matrix, using a symmetric/Hermitianrank-2k update of the form:

A := A - VW' - WV'.

The contents of a on exit are illustrated by the following examples with n = 5 and nb = 2:

where d denotes a diagonal element of the reduced matrix, a denotes an element of the originalmatrix that is unchanged, and vi denotes an element of the vector defining H(i).

1511


?latrsSolves a triangular system of equations with thescale factor set to prevent overflow.

Syntax

call slatrs( uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info )

call dlatrs( uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info )

call clatrs( uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info )

call zlatrs( uplo, trans, diag, normin, n, a, lda, x, scale, cnorm, info )

Description

The routine solves one of the triangular systems


with scaling to prevent overflow. Here A is an upper or lower triangular matrix, AT denotes thetranspose of A, AH denotes the conjugate transpose of A, x and b are n-element vectors, and s

is a scaling factor, usually less than or equal to 1, chosen so that the components of x will beless than the overflow threshold. If the unscaled problem will not cause overflow, the Level 2BLAS routine ?trsv is called. If the matrix A is singular (A(j,j) = 0 for some j), then s isset to 0 and a non-trivial solution to Ax = 0 is returned.

Input Parameters

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': Upper triangular= 'L': Lower triangular

CHARACTER*1.transSpecifies the operation applied to A.= 'N': solve A*x = s*b (no transpose)= 'T': solve AT*x = s*b (transpose)= 'C': solve AH*x = s*b (conjugate transpose)

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': non-unit triangular= 'N': non-unit triangular

1512


CHARACTER*1.norminSpecifies whether cnorm has been set or not.= 'Y': cnorm contains the column norms on entry;= 'N': cnorm is not set on entry. On exit, the norms will be computed and stored in cnorm.

INTEGER. The order of the matrix A. n ≥ 0n

REAL for slatrsaDOUBLE PRECISION for dlatrsCOMPLEX for clatrsCOMPLEX*16 for zlatrs.Array, DIMENSION (lda, n). Contains the triangular matrixA.If uplo = 'U', the leading n-by-n upper triangular part ofthe array a contains the upper triangular matrix, and thestrictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofthe array a contains the lower triangular matrix, and thestrictly upper triangular part of A is not referenced.If diag = 'U', the diagonal elements of A are also notreferenced and are assumed to be 1.

INTEGER. The leading dimension of the array a. lda ≥max(1, n).

lda

REAL for slatrsxDOUBLE PRECISION for dlatrsCOMPLEX for clatrsCOMPLEX*16 for zlatrs.Array, DIMENSION (n).On entry, the right hand side b of the triangular system.

REAL for slatrs/clatrscnormDOUBLE PRECISION for dlatrs/zlatrs.Array, DIMENSION (n).If normin = 'Y', cnorm is an input argument and cnorm(j) contains the norm of the off-diagonal part of the j-thcolumn of A.

1513


If trans = 'N', cnorm (j) must be greater than or equalto the infinity-norm, and if trans = 'T' or 'C', cnorm(j)must be greater than or equal to the 1-norm.

Output Parameters

On exit, x is overwritten by the solution vector x.x

REAL for slatrs/clatrsscaleDOUBLE PRECISION for dlatrs/zlatrs.Array, DIMENSION (lda, n). The scaling factor s for thetriangular system as described above.If scale = 0, the matrix A is singular or badly scaled, andthe vector x is an exact or approximate solution to Ax =0.


cnorm


Application Notes

A rough bound on x is computed; if that is less than overflow, ?trsv is called, otherwise,specific code is used which checks for possible overflow or divide-by-zero at every operation.

A columnwise scheme is used for solving Ax = b. The basic algorithm if A is lower triangularis

x[1:n] := b[1:n]

for j = 1, ..., n

x(j) := x(j) / A(j,j)

x[j+1:n] := x[j+1:n] - x(j)*a[j+1:n,j]

end

Define bounds on the components of x after j iterations of the loop:

M(j) = bound on x[1:j]

G(j) = bound on x[j+1:n]

1514


Initially, let M(0) = 0 and G(0) = max{x(i), i=1,...,n}.

Then for iteration j+1 we have

M(j+1) ≤ G(j) / | a(j+1,j+1)|

G(j+1) ≤ G(j) + M(j+1)*| a[j+2:n,j+1]|

≤ G(j)(1 + cnorm(j+1)/ | a(j+1,j+1)|,

where cnorm(j+1) is greater than or equal to the infinity-norm of column j+1 of a, not countingthe diagonal. Hence

and

Since |x(j)| ≤ M(j), we use the Level 2 BLAS routine ?trsv if the reciprocal of the largestM(j), j=1,..,n, is larger than max(underflow, 1/overflow).

The bound on x(j) is also used to determine when a step in the columnwise method can beperformed without fear of overflow. If the computed bound is greater than a large constant, xis scaled to prevent overflow, but if the bound overflows, x is set to 0, x(j) to 1, and scale to0, and a non-trivial solution to Ax = 0 is found.

Similarly, a row-wise scheme is used to solve ATx = b or AHx = b. The basic algorithm for Aupper triangular is

for j = 1, ..., n

x(j) := ( b(j) - A[1:j-1,j]' x[1:j-1]) / A(j,j)

end

We simultaneously compute two bounds

1515


G(j) = bound on ( b(i) - A[1:i-1,i]'*x[1:i-1]), 1≤ i≤ j

M(j) = bound on x(i), 1≤ i≤ j

The initial values are G(0) = 0, M(0) = max{ b(i), i=1,..,n}, and we add the constraint

G(j) ≥ G(j-1) and M(j) ≥ M(j-1) for j ≥ 1.

Then the bound on x(j) is

M(j) ≤ M(j-1) *(1 + cnorm(j)) / | A(j,j)|

and we can safely call ?trsv if 1/M(n) and 1/G(n) are both greater than max(underflow,1/overflow).

?latrzFactors an upper trapezoidal matrix by means oforthogonal/unitary transformations.

Syntax

call slatrz( m, n, l, a, lda, tau, work )

call dlatrz( m, n, l, a, lda, tau, work )

call clatrz( m, n, l, a, lda, tau, work )

call zlatrz( m, n, l, a, lda, tau, work )

Description

The routine ?latrz factors the m-by-(m+l) real/complex upper trapezoidal matrix

[ A1 A2 ] = [ A(1:m,1:m) A(1: m, n-l+1:n)]

as ( R 0 )* Z, by means of orthogonal/unitary transformations. Z is an (m+l)-by-(m+l)orthogonal/unitary matrix and R and A1 are m-by -m upper triangular matrices.

1516


Input Parameters



INTEGER. The number of columns of the matrix A containingthe meaningful part of the Householder vectors.

l

n-m ≥ l ≥ 0.

REAL for slatrzaDOUBLE PRECISION for dlatrzCOMPLEX for clatrzCOMPLEX*16 for zlatrz.Array, DIMENSION (lda, n).On entry, the leading m-by-n upper trapezoidal part of thearray a must contain the matrix to be factorized.

INTEGER. The leading dimension of the array a. lda ≥max(1,m).

lda

REAL for slatrzworkDOUBLE PRECISION for dlatrzCOMPLEX for clatrzCOMPLEX*16 for zlatrz.Workspace array, DIMENSION (m).

Output Parameters

On exit, the leading m-by-m upper triangular part of acontains the upper triangular matrix R, and elements n-l+1to n of the first m rows of a, with the array tau, representthe orthogonal/unitary matrix Z as a product of m elementaryreflectors.

a

REAL for slatrztauDOUBLE PRECISION for dlatrzCOMPLEX for clatrzCOMPLEX*16 for zlatrz.Array, DIMENSION (m).The scalar factors of the elementary reflectors.

1517


Application Notes

The factorization is obtained by Householder's method. The k-th transformation matrix, z(k),which is used to introduce zeros into the (m - k + 1)-th row of A, is given in the form

where

tau is a scalar and z( k ) is an l-element vector. tau and z( k ) are chosen to annihilate theelements of the k-th row of A2.

The scalar tau is returned in the k-th element of tau and the vector u( k ) in the k-th row ofA2, such that the elements of z( k ) are in a( k, l + 1 ), ..., a( k, n ).

The elements of r are returned in the upper triangular part of A1.

Z is given by

Z = Z( 1 ) Z( 2 ) ... Z( m ).

1518


?lauu2Computes the product U*UH or LH*L, where U andL are upper or lower triangular matrices (unblockedalgorithm).

Syntax

call slauu2( uplo, n, a, lda, info )

call dlauu2( uplo, n, a, lda, info )

call clauu2( uplo, n, a, lda, info )

call zlauu2( uplo, n, a, lda, info )

Description

The routine ?lauu2 computes the product U*U' or L'*L, where the triangular factor U or L isstored in the upper or lower triangular part of the array a.

If uplo = 'U' or 'u' , then the upper triangle of the result is stored, overwriting the factor Uin A.

If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor Lin A.

This is the unblocked form of the algorithm, calling BLAS Level 2 Routines.

Input Parameters

CHARACTER*1.uploSpecifies whether the triangular factor stored in the arraya is upper or lower triangular:= 'U': Upper triangular= 'L': Lower triangular

INTEGER. The order of the triangular factor U or L. n ≥ 0.n

REAL for slauu2aDOUBLE PRECISION for dlauu2COMPLEX for clauu2COMPLEX*16 for zlauu2.Array, DIMENSION (lda, n). On entry, the triangular factorU or L.

1519



lda

Output Parameters

On exit, if uplo = 'U', the upper triangle of A is overwrittenwith the upper triangle of the product U*U'; if uplo = 'L',the lower triangle of A is overwritten with the lower triangleof the product L'*L.

a


?lauumComputes the product U*UH or LH*L, where U andL are upper or lower triangular matrices (blockedalgorithm).

Syntax

call slauum( uplo, n, a, lda, info )

call dlauum( uplo, n, a, lda, info )

call clauum( uplo, n, a, lda, info )

call zlauum( uplo, n, a, lda, info )

Description

The routine ?lauum computes the product U*U' or L'*L, where the triangular factor U or L isstored in the upper or lower triangular part of the array a.

If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor Uin A.

If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor Lin A.

This is the blocked form of the algorithm, calling BLAS Level 3 Routines.

1520


Input Parameters

CHARACTER*1.uploSpecifies whether the triangular factor stored in the arraya is upper or lower triangular:= 'U': Upper triangular= 'L': Lower triangular

INTEGER. The order of the triangular factor U or L. n ≥ 0.n

REAL for slauumaDOUBLE PRECISION for dlauumCOMPLEX for clauumCOMPLEX*16 for zlauum .Array, DIMENSION (lda, n).On entry, the triangular factor U or L.


lda

Output Parameters

On exit, if uplo = 'U', the upper triangle of A is overwrittenwith the upper triangle of the product U*U'; if uplo = 'L',the lower triangle of A is overwritten with the lower triangleof the product L'*L.

a


?lazq3Checks for deflation, computes a shift and callsdqds.

Syntax

call slazq3( i0, n0, z, pp, dmin, sigma, desig, qmax, nfail, iter, ndiv, ieee,ttype, dmin1, dmin2, dn, dn1, dn2, tau )

call dlazq3( i0, n0, z, pp, dmin, sigma, desig, qmax, nfail, iter, ndiv, ieee,ttype, dmin1, dmin2, dn, dn1, dn2, tau )

1521


Description

The routine ?lazq3 checks for deflation, computes a shift (tau) and calls dqds. In case offailure, it changes shifts, and tries again until output is positive.

This routine is a thread safe version of ?lasq3 routine, which passes ttype, dmin1, dmin2, dn,dn1, dn2, and tau through the argument list in place of declaring them in a SAVE statement.

Input Parameters

INTEGER.i0First index.

INTEGER.n0Last index.

REAL for slasq3zDOUBLE PRECISION for dlasq3.Array, DIMENSION (4*n). z holds the qd array.

INTEGER.pppp=0 for ping, pp=1 for pong.

REAL for slazq3desigDOUBLE PRECISION for dlazq3.Lower order part of sigma.

REAL for slazq3qmaxDOUBLE PRECISION for dlazq3.Maximum value of q.

LOGICAL.ieeeFlag for IEEE or non-IEEE arithmetic (passed to the routine).

INTEGER. Shift type.ttype

REAL for slazq3dmin1DOUBLE PRECISION for dlazq3.Minimum value of d, excluding d(n0). Should be 0 on entryat the first iteration and should not be modified further.

REAL for slazq3dmin2DOUBLE PRECISION for dlazq3.Minimum value of d, excluding d(n0) and d(n0-1). Shouldbe 0 on entry at the first iteration and should not bemodified further.

REAL for slazq3dn

1522


DOUBLE PRECISION for dlazq3.Contains d(n). Should be 0 on entry at the first iterationand should not be modified further.

REAL for slazq3dn1DOUBLE PRECISION for dlazq3.Contains d(n-1). Should be 0 on entry at the first iterationand should not be modified further.

REAL for slazq3dn2DOUBLE PRECISION for dlazq3.Contains d(n-2). Should be 0 on entry at the first iterationand should not be modified further.

REAL for slazq3tauDOUBLE PRECISION for dlazq3.Shift value

Output Parameters

REAL for slazq3dminDOUBLE PRECISION for dlazq3.Minimum value of d.

REAL for slazq3sigmaDOUBLE PRECISION for dlazq3.Sum of shifts used in current segment.

Lower order part of sigma.desig

INTEGER.nfailNumber of times shift was too big.

INTEGER.iterNumber of iterations.

INTEGER.ndivNumber of divisions.

Shift type.ttype

Minimum value of d, excluding d(n0).dmin1

Minimum value of d, excluding d(n0) and d(n0-1).dmin2

d(n).dn

d(n-1).dn1

1523


d(n-2).dn2

Shift value.tau

?lazq4Computes an approximation to the smallesteigenvalue using values of d from the previoustransform.

Syntax

call slazq4( i0, n0, z, pp, n0in, dmin, dmin1, dmin2, dn, dn1, dn2, tau,ttype, g )

call dlazq4( i0, n0, z, pp, n0in, dmin, dmin1, dmin2, dn, dn1, dn2, tau,ttype, g )

Description

The routine computes an approximation tau to the smallest eigenvalue using values of d fromthe previous transform.

This routine is a thread safe version of ?lasq4 routine, which passes g through the argumentlist in place of declaring g in a SAVE statement.

Input Parameters

INTEGER.i0First index.

INTEGER.n0Last index.

REAL for slazq4zDOUBLE PRECISION for dlazq4.Array, DIMENSION (4*n).z holds the qd array.

INTEGER.pppp=0 for ping, pp=1 for pong.

INTEGER. The value of n0 at start of eigtest.n0in

REAL for slazq4dminDOUBLE PRECISION for dlazq4.

1524


Minimum value of d.

REAL for slazq4dmin1DOUBLE PRECISION for dlazq4.Minimum value of d, excluding d(n0).

REAL for slazq4dmin2DOUBLE PRECISION for dlazq4.Minimum value of d, excluding d(n0)and d(n0-1).

REAL for slazq4dnDOUBLE PRECISION for dlazq4.Contains d(n).

REAL for slazq4dn1DOUBLE PRECISION for dlazq4.Contains d(n-1).

REAL for slazq4dn2DOUBLE PRECISION for dlazq4.Contains d(n-2).

REAL for slazq4gDOUBLE PRECISION for dlazq4.Shift coefficient.

Output Parameters

REAL for slazq4tauDOUBLE PRECISION for dlazq4.Shift value.

INTEGER.ttypeShift type.

Shift coefficient.g

1525


?org2l/?ung2lGenerates all or part of the orthogonal/unitarymatrix Q from a QL factorization determined by?geqlf (unblocked algorithm).

Syntax

call sorg2l( m, n, k, a, lda, tau, work, info )

call dorg2l( m, n, k, a, lda, tau, work, info )

call cung2l( m, n, k, a, lda, tau, work, info )

call zung2l( m, n, k, a, lda, tau, work, info )

Description

The routine ?org2l/?ung2l generates an m-by-n real/complex matrix Q with orthonormalcolumns, which is defined as the last n columns of a product of k elementary reflectors of orderm:

Q = H(k) . . . H(2) H(1) as returned by ?geqlf.

Input Parameters

INTEGER. The number of rows of the matrix Q. m ≥ 0.m

INTEGER. The number of columns of the matrix Q. m ≥ n

≥ 0.

n


product defines the matrix Q. n ≥ k ≥ 0.

k

REAL for sorg2laDOUBLE PRECISION for dorg2lCOMPLEX for cung2lCOMPLEX*16 for zung2l.Array, DIMENSION (lda,n).On entry, the ( n -k+i)-th column must contain the vectorwhich defines the elementary reflector H(i), for i = 1,2,...,k, as returned by ?geqlf in the last k columns of its arrayargument A.

1526


INTEGER. The first dimension of the array a. lda ≥max(1,m).

lda

REAL for sorg2ltauDOUBLE PRECISION for dorg2lCOMPLEX for cung2lCOMPLEX*16 for zung2l.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?geqlf.

REAL for sorg2lworkDOUBLE PRECISION for dorg2lCOMPLEX for cung2lCOMPLEX*16 for zung2l.Workspace array, DIMENSION (n).

Output Parameters

On exit, the m-by-n matrix Q.a

INTEGER.info= 0: successful exit< 0: if info = -i, the i-th argument has an illegal value

?org2r/?ung2rGenerates all or part of the orthogonal/unitarymatrix Q from a QR factorization determined by?geqrf (unblocked algorithm).

Syntax

call sorg2r( m, n, k, a, lda, tau, work, info )

call dorg2r( m, n, k, a, lda, tau, work, info )

call cung2r( m, n, k, a, lda, tau, work, info )

call zung2r( m, n, k, a, lda, tau, work, info )

1527

Description

The routine ?org2r/?ung2r generates an m-by-n real/complex matrix Q with orthonormalcolumns, which is defined as the first n columns of a product of k elementary reflectors of orderm

Q = H(1) H(2) . . . H(k)

as returned by ?geqrf.

Input Parameters


INTEGER. The number of columns of the matrix Q. m ≥ n

≥ 0.

n


product defines the matrix Q. n ≥ k ≥ 0.

k

REAL for sorg2raDOUBLE PRECISION for dorg2rCOMPLEX for cung2rCOMPLEX*16 for zung2r.Array, DIMENSION (lda, n).On entry, the i-th column must contain the vector whichdefines the elementary reflector H(i), for i = 1,2,..., k, asreturned by ?geqrf in the first k columns of its arrayargument a.

INTEGER. The first DIMENSION of the array a. lda ≥max(1,m).

lda

REAL for sorg2rtauDOUBLE PRECISION for dorg2rCOMPLEX for cung2rCOMPLEX*16 for zung2r.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?geqrf.

REAL for sorg2rworkDOUBLE PRECISION for dorg2r

1528


COMPLEX for cung2rCOMPLEX*16 for zung2r.Workspace array, DIMENSION (n).

Output Parameters



?orgl2/?ungl2Generates all or part of the orthogonal/unitarymatrix Q from an LQ factorization determined by?gelqf (unblocked algorithm).

Syntax

call sorgl2( m, n, k, a, lda, tau, work, info )

call dorgl2( m, n, k, a, lda, tau, work, info )

call cungl2( m, n, k, a, lda, tau, work, info )

call zungl2( m, n, k, a, lda, tau, work, info )

Description

The routine ?orgl2/?ungl2 generates a m-by-n real/complex matrix Q with orthonormal rows,which is defined as the first m rows of a product of k elementary reflectors of order n

Q = H(k) . . . H(2) H(1) or Q = H(k) ' . . . H(2)' H(1)'

as returned by ?gelqf.

Input Parameters


INTEGER. The number of columns of the matrix Q. n ≥ m.n


product defines the matrix Q. m ≥ k ≥ 0.

k

1529


REAL for sorgl2aDOUBLE PRECISION for dorgl2COMPLEX for cungl2COMPLEX*16 for zungl2.Array, DIMENSION (lda, n). On entry, the i-th row mustcontain the vector which defines the elementary reflectorH(i), for i = 1,2,..., k, as returned by ?gelqf inthe first k rows of its array argument a.


lda

REAL for sorgl2tauDOUBLE PRECISION for dorgl2COMPLEX for cungl2COMPLEX*16 for zungl2.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?gelqf.

REAL for sorgl2workDOUBLE PRECISION for dorgl2COMPLEX for cungl2COMPLEX*16 for zungl2.Workspace array, DIMENSION (m).

Output Parameters


INTEGER.info= 0: successful exit< 0: if info = -i, the i-th argument has an illegal value.

1530

?orgr2/?ungr2Generates all or part of the orthogonal/unitarymatrix Q from an RQ factorization determined by?gerqf (unblocked algorithm).

Syntax

call sorgr2( m, n, k, a, lda, tau, work, info )

call dorgr2( m, n, k, a, lda, tau, work, info )

call cungr2( m, n, k, a, lda, tau, work, info )

call zungr2( m, n, k, a, lda, tau, work, info )

Description

The routine ?orgr2/?ungr2 generates an m-by-n real matrix Q with orthonormal rows, whichis defined as the last m rows of a product of k elementary reflectors of order n

Q = H(1) H(2) . . . H(k) or Q = H(1)' H(2)' . . . H(k)'

as returned by ?gerqf.

Input Parameters


INTEGER. The number of columns of the matrix Q. n ≥ mn

INTEGER.kThe number of elementary reflectors whose product defines

the matrix Q. m ≥ k ≥ 0.

REAL for sorgr2aDOUBLE PRECISION for dorgr2COMPLEX for cungr2COMPLEX*16 for zungr2.Array, DIMENSION (lda, n).On entry, the ( m- k+i)-th row must contain the vector whichdefines the elementary reflector H(i), for i = 1,2,...,k, as returned by ?gerqf in the last k rows of its arrayargument a.

1531



lda

REAL for sorgr2tauDOUBLE PRECISION for dorgr2COMPLEX for cungr2COMPLEX*16 for zungr2.Array, DIMENSION (k).tau(i) must contain the scalar factorof the elementary reflector H(i), as returned by ?gerqf.

REAL for sorgr2workDOUBLE PRECISION for dorgr2COMPLEX for cungr2COMPLEX*16 for zungr2.Workspace array, DIMENSION (m).

Output Parameters



?orm2l/?unm2lMultiplies a general matrix by theorthogonal/unitary matrix from a QL factorizationdetermined by ?geqlf (unblocked algorithm).

Syntax

call sorm2l( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call dorm2l( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call cunm2l( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call zunm2l( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

Description

The routine ?orm2l/?unm2l overwrites the general real/complex m-by-n matrix C with

1532


Q*C if side = 'L' and trans = 'N', or

Q'*C if side = 'L' and trans = 'T' (for real flavors) or

trans = 'C' (for complex flavors), or

C*Q if side = 'R' and trans = 'N', or

C*Q' if side = 'R' and trans = 'T' (for real flavors) or

trans = 'C' (for complex flavors)

where Q is a real orthogonal or complex unitary matrix defined as the product of k elementaryreflectors

Q = H(k) . . . H(2) H(1)

as returned by ?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

CHARACTER*1.side= 'L': apply Q or Q' from the left= 'R': apply Q or Q' from the right

CHARACTER*1.trans= 'N': apply Q (no transpose)= 'T': apply Q' (transpose, for real flavors)= 'C': apply Q' (conjugate transpose, for complex flavors)

INTEGER. The number of rows of the matrix C. m ≥ 0.m

INTEGER. The number of columns of the matrix C. n ≥ 0.n

INTEGER. The number of elementary reflectors whoseproduct defines the matrix Q.

k

If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

REAL for sorm2laDOUBLE PRECISION for dorm2lCOMPLEX for cunm2lCOMPLEX*16 for zunm2l.Array, DIMENSION (lda,k).

1533


The i-th column must contain the vector which defines theelementary reflector H(i), for i = 1,2,..., k, as returnedby ?geqlf in the last k columns of its array argument a.The array a is modified by the routine but restored on exit.


If side = 'L', lda ≥ max(1, m)

if side = 'R', lda ≥ max(1, n).

REAL for sorm2ltauDOUBLE PRECISION for dorm2lCOMPLEX for cunm2lCOMPLEX*16 for zunm2l.Array, DIMENSION (k). tau(i) must contain the scalar factorof the elementary reflector H(i), as returned by ?geqlf.

REAL for sorm2lcDOUBLE PRECISION for dorm2lCOMPLEX for cunm2lCOMPLEX*16 for zunm2l.Array, DIMENSION (LDc, n).On entry, the m-by-n matrix C.

INTEGER. The leading dimension of the array C. ldc ≥max(1,m).

ldc

REAL for sorm2lworkDOUBLE PRECISION for dorm2lCOMPLEX for cunm2lCOMPLEX*16 for zunm2l.Workspace array, DIMENSION:(n) if side = 'L',(m) if side = 'R'.

Output Parameters

On exit, c is overwritten by QC,or Q'C, or CQ', or CQ.c

INTEGER.info= 0: successful exit< 0: if info = -i, the i-th argument had an illegal value

1534

?orm2r/?unm2rMultiplies a general matrix by theorthogonal/unitary matrix from a QR factorizationdetermined by ?geqrf (unblocked algorithm).

Syntax

call sorm2r( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call dorm2r( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call cunm2r( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call zunm2r( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

Description

The routine ?orm2r/?unm2r overwrites the general real/complex m-by-n matrix C with








Q = H(1) H(2) . . . H(k)

as returned by ?geqrf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters


CHARACTER*1.trans= 'N': apply Q (no transpose)= 'T': apply Q' (transpose, for real flavors)

1535


= 'C': apply Q' (conjugate transpose, for complex flavors)




k

If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

REAL for sorm2raDOUBLE PRECISION for dorm2rCOMPLEX for cunm2rCOMPLEX*16 for zunm2r.Array, DIMENSION (lda,k).The i-th column must contain the vector which defines theelementary reflector H(i), for i = 1,2,..., k, as returnedby ?geqrf in the first k columns of its array argument a.The array a is modified by the routine but restored on exit.


If side = 'L', lda ≥ max(1, m);

if side = 'R', lda ≥ max(1, n).

REAL for sorm2rtauDOUBLE PRECISION for dorm2rCOMPLEX for cunm2rCOMPLEX*16 for zunm2r.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?geqrf.

REAL for sorm2rcDOUBLE PRECISION for dorm2rCOMPLEX for cunm2rCOMPLEX*16 for zunm2r.Array, DIMENSION (LDc, n).On entry, the m-by-n matrix C.

INTEGER. The leading dimension of the array c. ldc ≥max(1,m).

ldc

1536


REAL for sorm2rworkDOUBLE PRECISION for dorm2rCOMPLEX for cunm2rCOMPLEX*16 for zunm2r.Workspace array, DIMENSION(n) if side = 'L',(m) if side = 'R'.

Output Parameters

On exit, c is overwritten by QC, or Q'C, or CQ', or CQ.c


?orml2/?unml2Multiplies a general matrix by theorthogonal/unitary matrix from a LQ factorizationdetermined by ?gelqf (unblocked algorithm).

Syntax

call sorml2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call dorml2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call cunml2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call zunml2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

Description

The routine ?orml2/?unml2 overwrites the general real/complex m-by-n matrix C with






1537




Q = H(k) . . . H(2) H(1) or Q = H(k)' . . . H(2)' H(1)'

as returned by ?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters






k

If side = 'L', m ≥ k ≥ 0;

if side = 'r', n ≥ k ≥ 0.

REAL for sorml2aDOUBLE PRECISION for dorml2COMPLEX for cunml2COMPLEX*16 for zunml2.Array, DIMENSION(lda, m) if side = 'L',(lda, n) if side = 'R'The i-th row must contain the vector which defines theelementary reflector H(i), for i = 1,2,..., k, as returnedby ?gelqf in the first k rows of its array argument a. Thearray a is modified by the routine but restored on exit.

INTEGER. The leading dimension of the array a. lda ≥max(1,k).

lda

1538


REAL for sorml2tauDOUBLE PRECISION for dorml2COMPLEX for cunml2COMPLEX*16 for zunml2.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?gelqf.

REAL for sorml2cDOUBLE PRECISION for dorml2COMPLEX for cunml2COMPLEX*16 for zunml2.Array, DIMENSION (ldc, n) On entry, the m-by-n matrix C.


ldc

REAL for sorml2workDOUBLE PRECISION for dorml2COMPLEX for cunml2COMPLEX*16 for zunml2.Workspace array, DIMENSION(n) if side = 'L',(m) if side = 'R'

Output Parameters



1539


?ormr2/?unmr2Multiplies a general matrix by theorthogonal/unitary matrix from a RQ factorizationdetermined by ?gerqf (unblocked algorithm).

Syntax

call sormr2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call dormr2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call cunmr2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

call zunmr2( side, trans, m, n, k, a, lda, tau, c, ldc, work, info )

Description

The routine ?ormr2/?unmr2 overwrites the general real/complex m-by-n matrix C with








Q = H(1) H(2) . . . H(k) or Q = H(1)' H(2)' . . . H(k)'

as returned by ?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters


CHARACTER*1.trans= 'N': apply Q (no transpose)= 'T': apply Q' (transpose, for real flavors)

1540


= 'C': apply Q' (conjugate transpose, for complex flavors)




k

If side = 'L', m ≥ k ≥ 0;

if side = 'r', n ≥ k ≥ 0.

REAL for sormr2aDOUBLE PRECISION for dormr2COMPLEX for cunmr2COMPLEX*16 for zunmr2.Array, DIMENSION(lda, m) if side = 'L',(lda, n) if side = 'r'The i-th row must contain the vector which defines theelementary reflector H(i), for i = 1,2,...,k, as returnedby ?gerqf in the last k rows of its array argument a. Thearray a is modified by the routine but restored on exit.

INTEGER.lda

The leading dimension of the array a. lda ≥ max(1,k).

REAL for sormr2tauDOUBLE PRECISION for dormr2COMPLEX for cunmr2COMPLEX*16 for zunmr2.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?gerqf.

REAL for sormr2cDOUBLE PRECISION for dormr2COMPLEX for cunmr2COMPLEX*16 for zunmr2.Array, DIMENSION (ldc, n).On entry, the m-by-n matrix C.

1541



ldc

REAL for sormr2workDOUBLE PRECISION for dormr2COMPLEX for cunmr2COMPLEX*16 for zunmr2.Workspace array, DIMENSION(n) if side = 'L',(m) if side = 'R'

Output Parameters



?ormr3/?unmr3Multiplies a general matrix by theorthogonal/unitary matrix from a RZ factorizationdetermined by ?tzrzf (unblocked algorithm).

Syntax

call sormr3( side, trans, m, n, k, l, a, lda, tau, c, ldc, work, info )

call dormr3( side, trans, m, n, k, l, a, lda, tau, c, ldc, work, info )

call cunmr3( side, trans, m, n, k, l, a, lda, tau, c, ldc, work, info )

call zunmr3( side, trans, m, n, k, l, a, lda, tau, c, ldc, work, info )

Description

The routine ?ormr3/?unmr3 overwrites the general real/complex m-by-n matrix C with




1542






Q = H(1) H(2) . . . H(k)

as returned by ?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters






k

If side = 'L', m ≥ k ≥ 0;

if side = 'r' , n ≥ k ≥ 0.

INTEGER. The number of columns of the matrix A containingthe meaningful part of the Householder reflectors.

l

If side = 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

REAL for sormr3aDOUBLE PRECISION for dormr3COMPLEX for cunmr3COMPLEX*16 for zunmr3.Array, DIMENSION(lda, m) if side = 'L',

1543


(lda, n) if side = 'r'The i-th row must contain the vector which defines theelementary reflector H(i), for i = 1,2,...,k, as returnedby ?tzrzf in the last k rows of its array argument a. Thearray a is modified by the routine but restored on exit.

INTEGER.lda

The leading dimension of the array a. lda ≥ max(1,k).

REAL for sormr3tauDOUBLE PRECISION for dormr3COMPLEX for cunmr3COMPLEX*16 for zunmr3.Array, DIMENSION (k).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by ?tzrzf.

REAL for sormr3cDOUBLE PRECISION for dormr3COMPLEX for cunmr3COMPLEX*16 for zunmr3.Array, DIMENSION (ldc, n).On entry, the m-by-n matrix C.

INTEGER. The leading dimension of the array C. ldc ≥max(1,m).

ldc

REAL for sormr3workDOUBLE PRECISION for dormr3COMPLEX for cunmr3COMPLEX*16 for zunmr3.Workspace array, DIMENSION(n) if side = 'L',(m) if side = 'R'.

Output Parameters



1544


?pbtf2Computes the Cholesky factorization of asymmetric/ Hermitian positive-definite band matrix(unblocked algorithm).

Syntax

call spbtf2( uplo, n, kd, ab, ldab, info )

call dpbtf2( uplo, n, kd, ab, ldab, info )

call cpbtf2( uplo, n, kd, ab, ldab, info )

call zpbtf2( uplo, n, kd, ab, ldab, info )

Description

The routine computes the Cholesky factorization of a real symmetric or complex Hermitianpositive definite band matrix A.

The factorization has the form

A = U'*U , if uplo = 'U', or

A = L*L', if uplo = 'L',

where U is an upper triangular matrix, U' is the transpose of U, and L is lower triangular. Thisis the unblocked version of the algorithm, calling BLAS Level 2 Routines.

Input Parameters



INTEGER. The number of super-diagonals of the matrix A ifuplo = 'U' , or the number of sub-diagonals if uplo ='L'.

kd

kd ≥ 0.

REAL for spbtf2ab

1545


DOUBLE PRECISION for dpbtf2COMPLEX for cpbtf2COMPLEX*16 for zpbtf2.Array, DIMENSION (ldab, n).On entry, the upper or lower triangle of the symmetric/Hermitian band matrix A, stored in the first kd+1 rows ofthe array. The j-th column of A is stored in the j-th columnof the array ab as follows:if uplo = 'U', ab(kd+1+i -j,j) = A(i, j for max(1,

j-kd) ≤ i ≤ j;

if uplo = 'L', ab(1+i -j,j) = A(i, j for j ≤ i ≤min(n, j+kd).

INTEGER. The leading dimension of the array ab. ldab ≥kd+1.

ldab

Output Parameters

On exit, If info = 0, the triangular factor U or L from theCholesky factorization A = U'*U or A = L*L' of the bandmatrix A, in the same storage format as A.

ab

INTEGER.info= 0: successful exit< 0: if info = -k, the k-th argument had an illegal value> 0: if info = k, the leading minor of order k is not positivedefinite, and the factorization could not be completed.

1546

?potf2Computes the Cholesky factorization of asymmetric/Hermitian positive-definite matrix(unblocked algorithm).

Syntax

call spotf2( uplo, n, a, lda, info )

call dpotf2( uplo, n, a, lda, info )

call cpotf2( uplo, n, a, lda, info )

call zpotf2( uplo, n, a, lda, info )

Description

The routine ?potf2 computes the Cholesky factorization of a real symmetric or complexHermitian positive definite matrix A. The factorization has the form

A = U'*U, if uplo = 'U', or

A = L*L', if uplo = 'L',

where U is an upper triangular matrix and L is lower triangular.

This is the unblocked version of the algorithm, calling BLAS Level 2 Routines

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix A is stored.= 'U': upper triangular= 'L': lower triangular


REAL for spotf2aDOUBLE PRECISION or dpotf2COMPLEX for cpotf2COMPLEX*16 for zpotf2.Array, DIMENSION (lda, n).On entry, the symmetric/Hermitian matrix A.

1547


If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.


lda ≥ max(1,n).

Output Parameters

On exit, If info = 0, the factor U or L from the Choleskyfactorization A = U'*U, or A = L*L'.

a

INTEGER.info= 0: successful exit< 0: if info = -k, the k-th argument had an illegal value> 0: if info = k, the leading minor of order k is not positivedefinite, and the factorization could not be completed.

?ptts2Solves a tridiagonal system of the form A*X=Busing the L*D*LH factorization computed by?pttrf.

Syntax

call sptts2( n, nrhs, d, e, b, ldb )

call dptts2( n, nrhs, d, e, b, ldb )

call cptts2( iuplo, n, nrhs, d, e, b, ldb )

call zptts2( iuplo, n, nrhs, d, e, b, ldb )

Description

The routine ?ptts2 solves a tridiagonal system of the form

A*X = B

1548

Real flavors sptts2/dptts2 use the L D L' factorization of A computed by spttrf/dpttrf,and complex flavors cptts2/zptts2 use the U' D U or L D L' factorization of A computed bycpttrf/zpttrf.

D is a diagonal matrix specified in the vector d, U (or L) is a unit bidiagonal matrix whosesuperdiagonal (subdiagonal) is specified in the vector e, and X and B are n-by-nrhs matrices.

Input Parameters

INTEGER. Used with complex flavors only.iuploSpecifies the form of the factorization and whether thevector e is the superdiagonal of the upper bidiagonal factorU or the subdiagonal of the lower bidiagonal factor L.= 1: A = U'*D*U, e is the superdiagonal of U;= 0: A = L*D*L', e is the subdiagonal of L

INTEGER. The order of the tridiagonal matrix A. n ≥ 0.n

INTEGER. The number of right hand sides, that is, the

number of columns of the matrix B. nrhs ≥ 0.

nrhs

REAL for sptts2/cptts2dDOUBLE PRECISION for dptts2/zptts2.Array, DIMENSION (n).The n diagonal elements of the diagonal matrix D from thefactorization of A.

REAL for sptts2eDOUBLE PRECISION for dptts2COMPLEX for cptts2COMPLEX*16 for zptts2.Array, DIMENSION (n-1).Contains the (n-1) subdiagonal elements of the unitbidiagonal factor L from the LDL' factorization of A (for realflavors, or for complex flavors when iuplo = 0).For complex flavors when iuplo = 1, e contains the (n-1)superdiagonal elements of the unit bidiagonal factor U fromthe factorization A = U'*D*U.

REAL for sptts2/cptts2BDOUBLE PRECISION for dptts2/zptts2.Array, DIMENSION (ldb, nrhs).

1549


On entry, the right hand side vectors B for the system oflinear equations.

INTEGER. The leading dimension of the array B. ldb ≥max(1,n).

ldb

Output Parameters

On exit, the solution vectors, X.b

?rsclMultiplies a vector by the reciprocal of a real scalar.

Syntax

call srscl( n, sa, sx, incx )

call drscl( n, sa, sx, incx )

call csrscl( n, sa, sx, incx )

call zdrscl( n, sa, sx, incx )

Description

The routine ?rscl multiplies an n-element real/complex vector x by the real scalar 1/a. Thisis done without overflow or underflow as long as the final result x/a does not overflow orunderflow.

Input Parameters

INTEGER. The number of components of the vector x.n

REAL for srscl/csrsclsaDOUBLE PRECISION for drscl/zdrscl.The scalar a which is used to divide each component of the

vector x. sa must be ≥ 0, or the subroutine will divide byzero.

REAL for srsclsxDOUBLE PRECISION for drsclCOMPLEX for csrsclCOMPLEX*16 for zdrscl.

1550


Array, DIMENSION (1+(n-1)*abs(incx )).The n-element vector x.

INTEGER. The increment between successive values of thevector sx.

incx

If incx > 0, sx(1) = x(1) and sx(1+(i-1)*incx) = x

(i), 1< i ≤ n.

Output Parameters

On exit, the result x/a.sx

?sygs2/?hegs2Reduces a symmetric/Hermitian definitegeneralized eigenproblem to standard form, usingthe factorization results obtained from ?potrf(unblocked algorithm).

Syntax

call ssygs2( itype, uplo, n, a, lda, b, ldb, info )

call dsygs2( itype, uplo, n, a, lda, b, ldb, info )

call chegs2( itype, uplo, n, a, lda, b, ldb, info )

call zhegs2( itype, uplo, n, a, lda, b, ldb, info )

Description

The routine ?sygs2/?hegs2 reduces a real symmetric-definite or a complex Hermitian-definitegeneralized eigenproblem to standard form.

If itype = 1, the problem is

A*x = λ*B*x,

and A is overwritten by inv(U')* A*inv(U) or inv(L)*A*inv(L').

If itype = 2 or 3, the problem is A*B*x = λ*x, or B*A*x = λ*x,

and A is overwritten by U*A*U' or L'*A*L. B must have been previously factorized as U'*U orL*L' by ?potrf.

1551

Input Parameters

INTEGER.itype= 1: compute inv(U')*A*inv(U), or inv(L)*A*inv(L');= 2 or 3: compute U*A*U', or L'*A*L.

CHARACTER*1. Specifies whether the upper or lowertriangular part of the symmetric/Hermitian matrix A isstored, and how B has been factorized.

uplo

= 'U': upper triangular= 'L': lower triangular

INTEGER. The order of the matrices A and B. n ≥ 0.n

REAL for ssygs2aDOUBLE PRECISION for dsygs2COMPLEX for chegs2COMPLEX*16 for zhegs2.Array, DIMENSION (lda, n).On entry, the symmetric/Hermitian matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.

INTEGER.lda

The leading dimension of the array a. lda ≥ max(1,n).

REAL for ssygs2bDOUBLE PRECISION for dsygs2COMPLEX for chegs2COMPLEX*16 for zhegs2.Array, DIMENSION (ldb, n).The triangular factor from the Cholesky factorization of Bas returned by ?potrf.

INTEGER. The leading dimension of the array B. ldb ≥max(1,n).

ldb

1552


Output Parameters

On exit, If info = 0, the transformed matrix, stored in thesame format as A.

a

INTEGER.info= 0: successful exit.< 0: if info = -i, the i-th argument had an illegal value.

?sytd2/?hetd2Reduces a symmetric/Hermitian matrix to realsymmetric tridiagonal form by anorthogonal/unitary similaritytransformation(unblocked algorithm).

Syntax

call ssytd2( uplo, n, a, lda, d, e, tau, info )

call dsytd2( uplo, n, a, lda, d, e, tau, info )

call chetd2( uplo, n, a, lda, d, e, tau, info )

call zhetd2( uplo, n, a, lda, d, e, tau, info )

Description

The routine ?sytd2/?hetd2 reduces a real symmetric/complex Hermitian matrix A to realsymmetric tridiagonal form T by an orthogonal/unitary similarity transformation: Q'*A*Q =T.

Input Parameters



REAL for ssytd2aDOUBLE PRECISION for dsytd2

1553

COMPLEX for chetd2COMPLEX*16 for zhetd2.Array, DIMENSION (lda, n).On entry, the symmetric/Hermitian matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.


lda

Output Parameters

On exit, if uplo = 'U', the diagonal and first superdiagonalof A are overwritten by the corresponding elements of thetridiagonal matrix T, and the elements above the first

a

superdiagonal, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors;if uplo = 'L', the diagonal and first subdiagonal of A areoverwritten by the corresponding elements of the tridiagonalmatrix T, and the elements below the first subdiagonal, withthe array tau, represent the orthogonal/unitary matrix Q asa product of elementary reflectors.

REAL for ssytd2/chetd2dDOUBLE PRECISION for dsytd2/zhetd2.Array, DIMENSION (n).The diagonal elements of the tridiagonal matrix T:d(i) = a(i,i).

REAL for ssytd2/chetd2eDOUBLE PRECISION for dsytd2/zhetd2.Array, DIMENSION (n-1).The off-diagonal elements of the tridiagonal matrix T:e(i) = a(i,i+1) if uplo = 'U',e(i) = a(i+1,i) if uplo = 'L'.

1554


REAL for ssytd2tauDOUBLE PRECISION for dsytd2COMPLEX for chetd2COMPLEX*16 for zhetd2.Array, DIMENSION (n-1).The scalar factors of the elementary reflectors .


?sytf2Computes the factorization of a real/complexsymmetric indefinite matrix, using the diagonalpivoting method (unblocked algorithm).

Syntax

call ssytf2( uplo, n, a, lda, ipiv, info )

call dsytf2( uplo, n, a, lda, ipiv, info )

call csytf2( uplo, n, a, lda, ipiv, info )

call zsytf2( uplo, n, a, lda, ipiv, info )

Description

The routine ?sytf2 computes the factorization of a real/complex symmetric matrix A using theBunch-Kaufman diagonal pivoting method:

A = U*D*U' or A = L*D*L'

where U (or L) is a product of permutation and unit upper (lower) triangular matrices, U' is thetranspose of U, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.

This is the unblocked version of the algorithm, calling BLAS Level 2 Routines.

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is stored= 'U': upper triangular

1555


= 'L': lower triangular


REAL for ssytf2aDOUBLE PRECISION for dsytf2COMPLEX for csytf2COMPLEX*16 for zsytf2.Array, DIMENSION (lda, n).On entry, the symmetric matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.

INTEGER.lda

The leading dimension of the array a. lda ≥ max(1,n).

Output Parameters

On exit, the block diagonal matrix D and the multipliers usedto obtain the factor U or L.

a

INTEGER.ipivArray, DIMENSION (n).Details of the interchanges and the block structure of DIf ipiv(k) > 0, then rows and columns k and ipiv(k) wereinterchanged and D(k,k) is a 1-by-1 diagonal block.If uplo = 'U' and ipiv(k) = ipiv(k-1) < 0, then rowsand columns k-1 and -ipiv(k) were interchanged and D(k,k)is a 2-by-2 diagonal block. If uplo = 'L' and ipiv( k) =ipiv( k+1)< 0, then rows and columns k+1 and -ipiv(k)were interchanged and D(k:k+1,k:k+1) is a 2-by-2 diagonalblock.


1556

> 0: if info = k, D(k,k) is exactly zero. The factorizationhas been completed, but the block diagonal matrix D isexactly singular, and division by zero will occur if it is usedto solve a system of equations.

?hetf2Computes the factorization of a complex Hermitianmatrix, using the diagonal pivoting method(unblocked algorithm).

Syntax

call chetf2( uplo, n, a, lda, ipiv, info )

call zhetf2( uplo, n, a, lda, ipiv, info )

Description

The routine computes the factorization of a complex Hermitian matrix A using the Bunch-Kaufmandiagonal pivoting method:

A = U*D*U' or A = L*D*L'

where U (or L) is a product of permutation and unit upper (lower) triangular matrices, U' is theconjugate transpose of U, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonalblocks.

This is the unblocked version of the algorithm, calling BLAS Level 2 Routines.

Input Parameters

CHARACTER*1.uploSpecifies whether the upper or lower triangular part of theHermitian matrix A is stored:= 'U': Upper triangular= 'L': Lower triangular


COMPLEX for chetf2ACOMPLEX*16 for zhetf2.Array, DIMENSION (lda, n).On entry, the Hermitian matrix A.

1557


If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced.


lda

Output Parameters

On exit, the block diagonal matrix D and the multipliers usedto obtain the factor U or L.

a

INTEGER. Array, DIMENSION (n).ipivDetails of the interchanges and the block structure of DIf ipiv(k) > 0, then rows and columns k and ipiv(k) wereinterchanged and D(k,k) is a 1-by-1 diagonal block.If uplo = 'U' and ipiv(k) = ipiv( k-1) < 0, thenrows and columns k-1 and -ipiv(k) were interchanged andD(k-1:k,k-1:k ) is a 2-by-2 diagonal block.If uplo = 'L' and ipiv(k) = ipiv( k+1) < 0, thenrows and columns k+1 and -ipiv(k) were interchanged andD(k:k+1, k:k+1) is a 2-by-2 diagonal block.

INTEGER.info= 0: successful exit< 0: if info = -k, the k-th argument had an illegal value> 0: if info = k, D(k,k) is exactly zero. The factorizationhas been completed, but the block diagonal matrix D isexactly singular, and division by zero will occur if it is usedto solve a system of equations.

1558

?tgex2Swaps adjacent diagonal blocks in an upper (quasi)triangular matrix pair by an orthogonal/unitaryequivalence transformation.

Syntax

call stgex2( wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, j1, n1, n2,work, lwork, info )

call dtgex2( wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, j1, n1, n2,work, lwork, info )

call ctgex2( wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, j1, info )

call ztgex2( wantq, wantz, n, a, lda, b, ldb, q, ldq, z, ldz, j1, info )

Description

The real routines stgex2/dtgex2 swap adjacent diagonal blocks (A11, B11) and (A22, B22) ofsize 1-by-1 or 2-by-2 in an upper (quasi) triangular matrix pair (A, B) by an orthogonalequivalence transformation. (A, B) must be in generalized real Schur canonical form (as returnedby sgges/dgges), that is, A is block upper triangular with 1-by-1 and 2-by-2 diagonal blocks.B is upper triangular.

The complex routines ctgex2/ztgex2 swap adjacent diagonal 1-by-1 blocks (A11, B11) and(A22, B22) in an upper triangular matrix pair (A, B) by an unitary equivalence transformation.

(A, B) must be in generalized Schur canonical form, that is, A and B are both upper triangular.

All routines optionally update the matrices Q and Z of generalized Schur vectors:

Q(in) *A(in)* Z(in)' = Q(out)*A(out)* Z(out)'

Q(in)* B(in)* Z(in)' = Q(out)* B(out)* Z(out)'

Input Parameters

LOGICAL.wantqIf wantq = .TRUE. : update the left transformation matrixQ;If wantq = .FALSE. : do not update Q.

LOGICAL.wantz

1559


If wantz = .TRUE. : update the right transformation matrixZ;If wantz = .FALSE.: do not update Z.

INTEGER. The order of the matrices A and B. n ≥ 0.n

REAL for stgex2 DOUBLE PRECISION for dtgex2a, bCOMPLEX for ctgex2COMPLEX*16 for ztgex2.Arrays, DIMENSION (lda, n) and (ldb, n), respectively.On entry, the matrices A and B in the pair (A, B).


lda

INTEGER. The leading dimension of the array b. ldb ≥max(1,n).

ldb

REAL for stgex2 DOUBLE PRECISION for dtgex2Q, zCOMPLEX for ctgex2COMPLEX*16 for ztgex2.Arrays, DIMENSION (ldq, n) and (ldz, n), respectively.On entry, if wantq = .TRUE., q contains theorthogonal/unitary matrix Q, and if wantz = .TRUE., zcontains the orthogonal/unitary matrix Z.

INTEGER. The leading dimension of the array Q. ldq ≥ 1.ldq

If wantq = .TRUE., ldq ≥ n.

INTEGER. The leading dimension of the array z. ldz ≥ 1.ldz

If wantz = .TRUE., ldz ≥ n.

INTEGER.j1

The index to the first block (A11, B11). 1 ≤ j1 ≤ n.

INTEGER. Used with real flavors only. The order of the firstblock (A11, B11). n1 = 0, 1 or 2.

n1

INTEGER. Used with real flavors only. The order of thesecond block (A22, B22). n2 = 0, 1 or 2.

n2

REAL for stgex2workDOUBLE PRECISION for dtgex2.

1560


Workspace array, DIMENSION (max(1,lwork)). Used withreal flavors only.


lwork ≥ max(n*(n2+n1), 2*(n2+n1)2)

Output Parameters

On exit, the updated matrix A.a

On exit, the updated matrix B.B

On exit, the updated matrix Q.QNot referenced if wantq = .FALSE..

On exit, the updated matrix Z.zNot referenced if wantz = .FALSE..

INTEGER.info=0: Successful exit For stgex2/dtgex2: If info = 1, thetransformed matrix (A, B) would be too far from generalizedSchur form; the blocks are not swapped and (A, B) and (Q,Z) are unchanged. The problem of swapping is tooill-conditioned. If info = -16: lwork is too small.Appropriate value for lwork is returned in work(1).For ctgex2/ztgex2:If info = 1, the transformed matrix pair (A, B) would betoo far from generalized Schur form; the problem isill-conditioned.

1561


?tgsy2Solves the generalized Sylvester equation(unblocked algorithm).

Syntax

call stgsy2( trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f,ldf, scale, rdsum, rdscal, iwork, pq, info )

call dtgsy2( trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f,ldf, scale, rdsum, rdscal, iwork, pq, info )

call ctgsy2( trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f,ldf, scale, rdsum, rdscal, iwork, pq, info )

call ztgsy2( trans, ijob, m, n, a, lda, b, ldb, c, ldc, d, ldd, e, lde, f,ldf, scale, rdsum, rdscal, iwork, pq, info )

Description

The routine ?tgsy2 solves the generalized Sylvester equation:

using Level 1 and 2 BLAS, where R and L are unknown m-by-n matrices, (A, D), ( B, E) and (C,F) are given matrix pairs of size m-by -m, n-by-n and m-by-n, respectively. For stgsy2/dtgsy2,pairs (A, D) and (B, E) must be in generalized Schur canonical form, that is, A, B are upper quasitriangular and D, E are upper triangular. For ctgsy2/ztgsy2, matrices A, B, D and E are uppertriangular (that is, (A, D) and (B, E) in generalized Schur form).

The solution (r, L) overwrites (C, F). 0 ≤ scale ≤ 1 is an output scaling factor chosen to avoidoverflow.

In matrix notation, solving equation (1) corresponds to solve

Zx = scale* b,

where Z is defined as

1562


Here Ik is the identity matrix of size k and X' is the transpose of X. kron(X, Y) denotes theKronecker product between the matrices X and Y.

If trans = 'T' , solve the transposed (conjugate transposed) system

Z'y = scale* b

for y, which is equivalent to solve for r and L in

This case is used to compute an estimate of Dif[(A, D), (B, E)] = sigma_min(Z) usingreverse communication with ?lacon.

?tgsy2 also (for ijob ≥ 1) contributes to the computation in ?tgsyl of an upper bound onthe separation between two matrix pairs. Then the input (A, D), (B, E) are sub-pencils of thematrix pair (two matrix pairs) in ?tgsyl. See ?tgsyl for details.

Input Parameters

CHARACTER*1.transIf trans = 'N', solve the generalized Sylvester equation(1);If trans = 'T': solve the 'transposed' system (3).

INTEGER. Specifies what kind of functionality is to beperformed.

ijob

If ijob = 0: solve (1) only.If ijob = 1: a contribution from this subsystem to aFrobenius norm-based estimate of the separation betweentwo matrix pairs is computed (look ahead strategy is used);

1563


If ijob = 2: a contribution from this subsystem to aFrobenius norm-based estimate of the separation betweentwo matrix pairs is computed (?gecon on sub-systems isused).Not referenced if trans = 'T'.

INTEGER. On entry, m specifies the order of A and D, andthe row dimension of C, F, r and L.

m

INTEGER. On entry, n specifies the order of B and E, andthe column dimension of C, F, r and L.

n

REAL for stgsy2a, bDOUBLE PRECISION for dtgsy2COMPLEX for ctgsy2COMPLEX*16 for ztgsy2.Arrays, DIMENSION (lda, m) and (ldb, n), respectively. Onentry, a contains an upper (quasi) triangular matrix A andB contains an upper (quasi) triangular matrix b.

INTEGER. The leading dimension of the array a. lda ≥max(1, m).

lda

INTEGER.ldb

The leading dimension of the array b. ldb ≥ max(1, n).

REAL for stgsy2c, fDOUBLE PRECISION for dtgsy2COMPLEX for ctgsy2COMPLEX*16 for ztgsy2.Arrays, DIMENSION (ldc, n) and (ldf, n), respectively. Onentry, c contains the right-hand-side of the first matrixequation in (1) and f contains the right-hand-side of thesecond matrix equation in (1).

INTEGER. The leading dimension of the array c. ldc ≥max(1, m).

ldc

REAL for stgsy2d, eDOUBLE PRECISION for dtgsy2COMPLEX for ctgsy2COMPLEX*16 for ztgsy2.

1564


Arrays, DIMENSION (ldd, m) and (lde, n), respectively. Onentry, d contains an upper triangular matrix D and e containsan upper triangular matrix E.

INTEGER. The leading dimension of the array d. ldd ≥max(1, m).

ldd

INTEGER. The leading dimension of the array e. lde ≥max(1, n).

lde

INTEGER. The leading dimension of the array f. ldf ≥max(1, m).

ldf

REAL for stgsy2/ctgsy2rdsumDOUBLE PRECISION for dtgsy2/ztgsy2.On entry, the sum of squares of computed contributions tothe Dif-estimate under computation by ?tgsyL, where thescaling factor rdscal has been factored out.

REAL for stgsy2/ctgsy2rdscalDOUBLE PRECISION for dtgsy2/ztgsy2.On entry, scaling factor used to prevent overflow in rdsum.

INTEGER. Used with real flavors only.iworkWorkspace array, DIMENSION (m+n+2).

Output Parameters

On exit, if ijob = 0, c has been overwritten by the solutionR.

c

On exit, if ijob = 0, f has been overwritten by the solutionL.

f

REAL for stgsy2/ctgsy2scaleDOUBLE PRECISION for dtgsy2/ztgsy2.

On exit, 0 ≤ scale ≤ 1. If 0 < scale < 1, the solutionsR and L (C and F on entry) hold the solutions to a slightlyperturbed system, but the input matrices A, B, D and E havenot been changed. If scale = 0, r and L hold the solutionsto the homogeneous system with C = F = 0. Normallyscale = 1.

1565

On exit, the corresponding sum of squares updated withthe contributions from the current sub-system.

rdsum

If trans = 'T', rdsum is not touched.Note that rdsum only makes sense when ?tgsy2 is calledby ?tgsyl.

On exit, rdscal is updated with respect to the currentcontributions in rdsum.

rdscal

If trans = 'T' , rdscal is not touched.Note that rdscal only makes sense when ?tgsy2 is calledby ?tgsyl.

INTEGER. Used with real flavors only.pqOn exit, the number of subsystems (of size 2-by-2, 4-by-4and 8-by-8) solved by the routine stgsy2/dtgsy2.

INTEGER. On exit, if info is set toinfo= 0: Successful exit< 0: If info = -i, the i-th argument had an illegal value.> 0: The matrix pairs (A, D) and (B, E) have common orvery close eigenvalues.

?trti2Computes the inverse of a triangular matrix(unblocked algorithm).

Syntax

call strti2( uplo, diag, n, a, lda, info )

call dtrti2( uplo, diag, n, a, lda, info )

call ctrti2( uplo, diag, n, a, lda, info )

call ztrti2( uplo, diag, n, a, lda, info )

Description

The routine ?trti2 computes the inverse of a real/complex upper or lower triangular matrix.

This is the Level 2 BLAS version of the algorithm.

1566

Input Parameters

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': upper triangular= 'L': lower triangular

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': non-unit triangular= 'N': non-unit triangular


REAL for strti2aDOUBLE PRECISION for dtrti2COMPLEX for ctrti2COMPLEX*16 for ztrti2.Array, DIMENSION (lda, n).On entry, the triangular matrix A.If uplo = 'U', the leading n-by-n upper triangular part ofthe array a contains the upper triangular matrix, and thestrictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofthe array a contains the lower triangular matrix, and thestrictly upper triangular part of A is not referenced.If diag = 'U', the diagonal elements of A are also notreferenced and are assumed to be 1.


lda

Output Parameters

On exit, the (triangular) inverse of the original matrix, inthe same storage format.

a


1567


clag2zConverts a complex single precision matrix to acomplex double precision matrix.

Syntax

call clag2z( m, n, sa, ldsa, a, lda, info )

Description

The routine converts a complex single precision matrix SA to a complex double precision matrixA.

Note that while it is possible to overflow while converting from double to single, it is not possibleto overflow when converting from single to double.

This is a helper routine so there is no argument checking.

Input Parameters

INTEGER. The number of lines of the matrix A (m ≥ 0).m

INTEGER. The number of columns in the matrix A (n ≥ 0).n

INTEGER. The leading dimension of the array sa; ldsa ≥max(1, m).

ldsa

DOUBLE PRECISION array, DIMENSION (lda, n).aOn entry, contains the m-by-n coefficient matrix A.

INTEGER. The leading dimension of the array a; lda ≥max(1, m).

lda

Output Parameters

REAL array, DIMENSION (ldsa, n).saOn exit, contains the m-by-n coefficient matrix SA.


1568


dlag2sConverts a double precision matrix to a singleprecision matrix.

Syntax

call dlag2s( m, n, a, lda, sa, ldsa, info )

Description

The routine converts a double precision matrix SA to a single precision matrix A.

RMAX is the overflow for the single precision arithmetic. dlag2s checks that all the entries ofA are between -RMAX and RMAX. If not, the convertion is aborted and a flag is raised.


Input Parameters





lda


ldsa

Output Parameters

REAL array, DIMENSION (ldsa, n).saOn exit, if info = 0, contains the m-by-n coefficient matrixSA.

INTEGER.infoIf info = 0, the execution is successful.If info = k, the (i, j) entry of the matrix A has overflowedwhen moving from double precision to single precision. k isgiven by k = (i-1)*lda+j.

1569


slag2dConverts a single precision matrix to a doubleprecision matrix.

Syntax

call slag2d( m, n, sa, ldsa, a, lda, info )

Description

The routine converts a single precision matrix SA to a double precision matrix A.

Note that while it is possible to overflow while converting from double to single, it is not possibleto overflow when converting from single to double.


Input Parameters




ldsa



lda

Output Parameters



1570


zlag2cConverts a complex double precision matrix to acomplex single precision matrix.

Syntax

call zlag2c( m, n, a, lda, sa, ldsa, info )

Description

The routine converts a double precision complex matrix SA to a single precision complex matrixA.

RMAX is the overflow for the single precision arithmetic. zlag2c checks that all the entries ofA are between -RMAX and RMAX. If not, the convertion is aborted and a flag is raised.


Input Parameters





lda


ldsa

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info = k, the (i, j) entry of the matrix A has overflowedwhen moving from double precision to single precision. k isgiven by k = (i-1)*lda+j.

1571


Utility Functions and RoutinesThis section describes LAPACK utility functions and routines. Summary information about theseroutines is given in the following table:

Table 5-2 LAPACK Utility Routines


Routine Name

Returns the version of the Lapack library.ilaver

Environmental enquiry function which returns values for tuningalgorithmic performance.

ilaenv

Environmental enquiry function which returns values for tuningalgorithmic performance.

iparmq

Checks if the infinity and NaN arithmetic is safe. Called byilaenv.

ieeeck

Tests two characters for equality regardless of case.lsame

Tests two character strings for equality regardless of case.lsamen

Returns the square root of the underflow and overflowthresholds if the exponent-range is very large.

s, d?labad

Determines machine parameters for floating-point arithmetic.s, d?lamch

Called from ?lamc2. Determines machine parameters givenby beta, t, rnd, ieee1.

s, d?lamc1

Used by ?lamch. Determines machine parameters specifiedin its arguments list.

s, d?lamc2

Called from ?lamc1-?lamc5. Intended to force a and b to bestored prior to doing the addition of a and b.

s, d?lamc3

This is a service routine for ?lamc2.s, d?lamc4

Called from ?lamc2. Attempts to compute the largest machinefloating-point number, without overflow.

s, d?lamc5

1572



Routine Name

Return user time for a process.second/dsecnd

Error handling routine called by LAPACK routines.xerbla

ilaverReturns the version of the Lapack library.

Syntax

call ilaver( vers_major, vers_minor, vers_patch )

Description

This routine returns the version of the Lapack library.

Output Parameters

INTEGER.vers_majorReturns the major version of the LAPACK library.

INTEGER.vers_minorReturns the minor version from the major version of theLAPACK library.

INTEGER.vers_patchReturns the patch version from the minor version of theLAPACK library.

ilaenvEnvironmental enquiry function which returnsvalues for tuning algorithmic performance.

Syntax

value = ilaenv( ispec, name, opts, n1, n2, n3, n4 )

1573


Description

Enquiry function ilaenv is called from the LAPACK routines to choose problem-dependentparameters for the local environment. See ispec below for a description of the parameters.

This version provides a set of parameters which should give good, but not optimal, performanceon many of the currently available computers. Users are encouraged to modify this subroutineto set the tuning parameters for their particular machine using the option and problem sizeinformation in the arguments.

This routine will not function correctly if it is converted to all lower case. Converting it to allupper case is allowed.

Input Parameters

INTEGER.ispecSpecifies the parameter to be returned as the value ofilaenv:= 1: the optimal blocksize; if this value is 1, an unblockedalgorithm will give the best performance.= 2: the minimum block size for which the block routineshould be used; if the usable block size is less than thisvalue, an unblocked routine should be used.= 3: the crossover point (in a block routine, for n less thanthis value, an unblocked routine should be used)= 4: the number of shifts, used in the nonsymmetriceigenvalue routines (deprecated)= 5: the minimum column dimension for blocking to beused; rectangular blocks must have dimension at leastk-by-m, where k is given by ilaenv(2,...) and m byilaenv(5,...)= 6: the crossover point for the SVD (when reducing anm-by-n matrix to bidiagonal form, if max(m,n)/min(m,n)exceeds this value, a QR factorization is used first to reducethe matrix to a triangular form.)= 7: the number of processors= 8: the crossover point for the multishift QR and QZmethods for nonsymmetric eigenvalue problems(deprecated).

1574


= 9: maximum size of the subproblems at the bottom ofthe computation tree in the divide-and-conquer algorithm(used by ?gelsd and ?gesdd)=10: ieee NaN arithmetic can be trusted not to trap=11: infinity arithmetic can be trusted not to trap

12 ≤ ispec ≤ 16: ?hseqr or one of its subroutines, seeiparmq for detailed explanation.

CHARACTER*(*). The name of the calling subroutine, ineither upper case or lower case.

name

CHARACTER*(*). The character options to the subroutinename, concatenated into a single character string. Forexample, uplo = 'U', trans = 'T', and diag = 'N' fora triangular routine would be specified as opts = 'UTN'.

opts

INTEGER. Problem dimensions for the subroutine name;these may not all be required.

n1, n2, n3, n4

Output Parameters

INTEGER.value

If value ≥ 0: the value of the parameter specified byispec;If value = -k < 0: the k-th argument had an illegal value.

Application Notes

The following conventions have been used when calling ilaenv from the LAPACK routines:

1. opts is a concatenation of all of the character options to subroutine name, in the same orderthat they appear in the argument list for name, even if they are not used in determining thevalue of the parameter specified by ispec.

2. The problem dimensions n1, n2, n3, n4 are specified in the order that they appear in theargument list for name. n1 is used first, n2 second, and so on, and unused problem dimensionsare passed a value of -1.

3. The parameter value returned by ilaenv is checked for validity in the calling subroutine.For example, ilaenv is used to retrieve the optimal blocksize for strtri as follows:

nb = ilaenv( 1, 'strtri', uplo // diag, n, -1, -1, -1> )

if( nb.le.1 ) nb = max( 1, n )

1575

Below is an example of ilaenv usage in C language:

#include <stdio.h>

#include "mkl.h"

int main(void)

{

int size = 1000;

int ispec = 1;

int dummy = -1;

int blockSize1 = ilaenv(&ispec, "dsytrd", "U", &size, &dummy, &dummy,&dummy);

int blockSize2 = ilaenv(&ispec, "dormtr", "LUN", &size, &size, &dummy,&dummy);

printf("DSYTRD blocksize = %d\n", blockSize1);

printf("DORMTR blocksize = %d\n", blockSize2);

return 0;

}

iparmqEnvironmental enquiry function which returnsvalues for tuning algorithmic performance.

Syntax

value = iparmq( ispec, name, opts, n, ilo, ihi, lwork )

Description

This function sets problem and machine dependent parameters useful for ?hseqr and its

subroutines. It is called whenever ilaenv is called with 12≤ispec≤16.

Input Parameters

INTEGER.ispecSpecifies the parameter to be returned as the value ofiparmq:

1576

= 12: (inmin) Matrices of order nmin or less are sentdirectly to ?lahqr, the implicit double shift QR algorithm.nmin must be at least 11.= 13: (inwin) Size of the deflation window. This is best setgreater than or equal to the number of simultaneous shiftsns. Larger matrices benefit from larger deflation windows.= 14: (inibl) Determines when to stop nibbling and investin an (expensive) multi-shift QR sweep. If the aggressiveearly deflation subroutine finds ld converged eigenvaluesfrom an order nw deflation window andld>(nw*nibble)/100, then the next QR sweep is skippedand early deflation is applied immediately to the remainingactive diagonal block. Setting iparmq(ispec=14)=0 causesTTQRE to skip a multi-shift QR sweep whenever earlydeflation finds a converged eigenvalue. Settingiparmq(ispec=14) greater than or equal to 100 preventsTTQRE from skipping a multi-shift QR sweep.= 15: (nshfts) The number of simultaneous shifts in amulti-shift QR iteration.= 16: (iacc22) iparmq is set to 0, 1 or 2 with the followingmeanings.0: During the multi-shift QR sweep, ?laqr5 does notaccumulate reflections and does not use matrix-matrixmultiply to update the far-from-diagonal matrix entries.1: During the multi-shift QR sweep, ?laqr5 and/or ?laqr3accumulates reflections and uses matrix-matrix multiply toupdate the far-from-diagonal matrix entries.2: During the multi-shift QR sweep, ?laqr5 accumulatesreflections and takes advantage of 2-by-2 block structureduring matrix-matrix multiplies.(If ?trrm is slower than ?gemm, then iparmq(ispec=16)=1may be more efficient than iparmq(ispec=16)=2 despitethe greater level of arithmetic work implied by the latterchoice.)

CHARACTER*(*). The name of the calling subroutine.name

CHARACTER*(*). This is a concatenation of the stringarguments to TTQRE.

opts

INTEGER. n is the order of the Hessenberg matrix H.n

1577


INTEGER.ilo, ihiIt is assumed that H is already upper triangular in rows andcolumns 1:ilo-1 and ihi+1:n.

INTEGER.lworkThe amount of workspace available.

Output Parameters

INTEGER.value

If value ≥ 0: the value of the parameter specified byiparmq;If value = -k < 0: the k-th argument had an illegal value.

Application Notes

The following conventions have been used when calling ilaenv from the LAPACK routines:

1. opts is a concatenation of all of the character options to subroutine name, in the same orderthat they appear in the argument list for name, even if they are not used in determining thevalue of the parameter specified by ispec.

2. The problem dimensions n1, n2, n3, n4 are specified in the order that they appear in theargument list for name. n1 is used first, n2 second, and so on, and unused problem dimensionsare passed a value of -1.

3. The parameter value returned by ilaenv is checked for validity in the calling subroutine.For example, ilaenv is used to retrieve the optimal blocksize for strtri as follows:

nb = ilaenv( 1, 'strtri', uplo // diag, n, -1, -1, -1> )

if( nb.le.1 ) nb = max( 1, n )

ieeeckChecks if the infinity and NaN arithmetic is safe.Called by ilaenv.

Syntax

ival = ieeeck( ispec, zero, one )

1578

Description

The function ieeeck is called from ilaenv to verify that infinity and possibly NaN arithmeticis safe, that is, will not trap.

Input Parameters

INTEGER.ispecSpecifies whether to test just for inifinity arithmetic or bothfor infinity and NaN arithmetic:If ispec = 0: Verify infinity arithmetic only.If ispec = 1: Verify infinity and NaN arithmetic.

REAL. Must contain the value 0.0zeroThis is passed to prevent the compiler from optimizing awaythis code.

REAL. Must contain the value 1.0oneThis is passed to prevent the compiler from optimizing awaythis code.

Output Parameters

INTEGER.ivalIf ival = 0: Arithmetic failed to produce the correctanswers.If ival = 1: Arithmetic produced the correct answers.

lsameTests two characters for equality regardless ofcase.

Syntax

val = lsame(ca, cb)

Description

This logical function returns .TRUE. if ca is the same letter as cb regardless of case.

Input Parameters

CHARACTER*1. Specify the single characters to be compared.ca, cb

1579


Output Parameters

LOGICAL. Result of the comparison.val

lsamenTests two character strings for equality regardlessof case.

Syntax

val = lsamen(n, ca, cb)

Description

This logical function tests if the first n letters of the string ca are the same as the first n lettersof cb, regardless of case. The function lsamen returns .TRUE. if ca and cb are equivalentexcept for case and .FALSE. otherwise. lsamen also returns .FALSE. if len(ca) or len(cb)is less than n.

Input Parameters

INTEGER. The number of characters in ca and cb to becompared.

n

CHARACTER*(*). Specify two character strings of length atleast n to be compared. Only the first n characters of eachstring will be accessed.

ca, cb

Output Parameters


1580


?labadReturns the square root of the underflow andoverflow thresholds if the exponent-range is verylarge.

Syntax

call slabad( small, large )

call dlabad( small, large )

Description

This routine takes as input the values computed by slamch/dlamch for underflow and overflow,and returns the square root of each of these values if the log of large is sufficiently large. Thissubroutine is intended to identify machines with a large exponent range, such as the Crays,and redefine the underflow and overflow limits to be the square roots of the values computedby ?lamch. This subroutine is needed because ?lamch does not compensate for poor arithmeticin the upper half of the exponent range, as is found on a Cray.

Input Parameters

REAL for slabadsmallDOUBLE PRECISION for dlabad.The underflow threshold as computed by ?lamch.

REAL for slabadlargeDOUBLE PRECISION for dlabad.The overflow threshold as computed by ?lamch.

Output Parameters

On exit, if log10(large) is sufficiently large, the squareroot of small, otherwise unchanged.

small

On exit, if log10(large) is sufficiently large, the squareroot of large, otherwise unchanged.

large

1581


?lamchDetermines machine parameters for floating-pointarithmetic.

Syntax

val = slamch( cmach )

val = dlamch( cmach )

Description

The function ?lamch determines single precision and double precision machine parameters.

Input Parameters

CHARACTER*1. Specifies the value to be returned by ?lamch:cmach= 'E' or 'e', val = eps= 'S' or 's , val = sfmin= 'B' or 'b', val = base= 'P' or 'p', val = eps*base= 'n' or 'n', val = t= 'R' or 'r', val = rnd= 'm' or 'm', val = emin= 'U' or 'u', val = rmin= 'L' or 'l', val = emax= 'O' or 'o', val = rmaxwhereeps = relative machine precision;sfmin = safe minimum, such that 1/sfmin does notoverflow;base = base of the machine;prec = eps*base;t = number of (base) digits in the mantissa;rnd = 1.0 when rounding occurs in addition, 0.0 otherwise;emin = minimum exponent before (gradual) underflow;rmin = underflow_threshold - base**(emin-1);emax = largest exponent before overflow;rmax = overflow_threshold - (base**emax)*(1-eps).

1582


Output Parameters

REAL for slamchvalDOUBLE PRECISION for dlamchValue returned by the function.

?lamc1Called from ?lamc2. Determines machineparameters given by beta, t, rnd, ieee1.

Syntax

call slamc1( beta, t, rnd, ieee1 )

call dlamc1( beta, t, rnd, ieee1 )

Description

The routine ?lamc1 determines machine parameters given by beta, t, rnd, ieee1.

Output Parameters

INTEGER. The base of the machine.beta

INTEGER. The number of (beta) digits in the mantissa.t

LOGICAL.rndSpecifies whether proper rounding ( rnd = .TRUE. ) orchopping ( rnd = .FALSE. ) occurs in addition. This maynot be a reliable guide to the way in which the machineperforms its arithmetic.

LOGICAL.ieee1Specifies whether rounding appears to be done in the ieee'round to nearest' style.

1583


?lamc2Used by ?lamch. Determines machine parametersspecified in its arguments list.

Syntax

call slamc2( beta, t, rnd, eps, emin, rmin, emax, rmax )

call dlamc2( beta, t, rnd, eps, emin, rmin, emax, rmax )

Description

The routine ?lamc2 determines machine parameters specified in its arguments list.

Output Parameters

INTEGER. The base of the machine.beta

INTEGER. The number of (beta) digits in the mantissa.t

LOGICAL.rndSpecifies whether proper rounding (rnd = .TRUE.) orchopping (rnd = .FALSE.) occurs in addition. This maynot be a reliable guide to the way in which the machineperforms its arithmetic.

REAL for slamc2epsDOUBLE PRECISION for dlamc2The smallest positive number such thatfl(1.0 - eps) < 1.0,where fl denotes the computed value.

INTEGER. The minimum exponent before (gradual) underflowoccurs.

emin

REAL for slamc2rminDOUBLE PRECISION for dlamc2The smallest normalized number for the machine, given bybaseemin-1,where base is the floating point value of beta.

INTEGER.The maximum exponent before overflow occurs.emax

REAL for slamc2rmaxDOUBLE PRECISION for dlamc2

1584

The largest positive number for the machine, given bybaseemax(1 - eps), where base is the floating point value ofbeta.

?lamc3Called from ?lamc1-?lamc5. Intended to force aand b to be stored prior to doing the addition of aand b.

Syntax

val = slamc3( a, b )

val = dlamc3( a, b )

Description

The routine is intended to force A and B to be stored prior to doing the addition of A and B, foruse in situations where optimizers might hold one of these in a register.

Input Parameters

REAL for slamc3a, bDOUBLE PRECISION for dlamc3The values a and b.

Output Parameters

REAL for slamc3valDOUBLE PRECISION for dlamc3The result of adding values a and b.

?lamc4This is a service routine for ?lamc2.

Syntax

call slamc4( emin, start, base )

call dlamc4( emin, start, base )

1585


Description

This is a service routine for ?lamc2.

Input Parameters

REAL for slamc4startDOUBLE PRECISION for dlamc4The starting point for determining emin.

INTEGER. The base of the machine.base

Output Parameters

INTEGER. The minimum exponent before (gradual)underflow, computed by setting a = start and dividing bybase until the previous a can not be recovered.

emin

?lamc5Called from ?lamc2. Attempts to compute thelargest machine floating-point number, withoutoverflow.

Syntax

call slamc5( beta, p, emin, ieee, emax, rmax)

call dlamc5( beta, p, emin, ieee, emax, rmax)

Description

The routine ?lamc5 attempts to compute rmax, the largest machine floating-point number,without overflow. It assumes that emax + abs(emin) sum approximately to a power of 2. Itwill fail on machines where this assumption does not hold, for example, the Cyber 205 (emin= -28625, emax = 28718). It will also fail if the value supplied for emin is too large (that is,too close to zero), probably with overflow.

Input Parameters

INTEGER. The base of floating-point arithmetic.beta

INTEGER. The number of base beta digits in the mantissaof a floating-point value.

p

1586


INTEGER. The minimum exponent before (gradual)underflow.

emin

LOGICAL. A logical flag specifying whether or not thearithmetic system is thought to comply with the IEEEstandard.

ieee

Output Parameters

INTEGER. The largest exponent before overflow.emax

REAL for slamc5rmaxDOUBLE PRECISION for dlamc5The largest machine floating-point number.

second/dsecndReturn user time for a process.

Syntax

val = second()

val = dsecnd()

Description

The functions second/dsecnd return the user time for a process in seconds. These versionsget the time from the system function etime. The difference is that dsecnd returns the resultwith double presision.

Output Parameters

REAL for secondvalDOUBLE PRECISION for dsecndUser time for a process.

1587


xerblaError handling routine called by BLAS, LAPACK,VML routines.

Syntax

call xerbla( srname, info )

Description

The routine xerbla is an error handler for the BLAS, LAPACK, and VML routines. It is called bya BLAS, LAPACK, or VML routine if an input parameter has an invalid value.

A message is printed and execution stops.

Installers may consider modifying the stop statement in order to call system-specificexception-handling facilities.

Input Parameters

CHARACTER*6 The name of the routine which called xerbla.srname

INTEGER. The position of the invalid parameter in theparameter list of the calling routine.

info

1588


6ScaLAPACK Routines

This chapter describes the Intel® Math Kernel Library implementation of routines from the ScaLAPACKpackage for distributed-memory architectures. Routines are supported for both real and complex denseand band matrices to perform the tasks of solving systems of linear equations, solving linear least-squaresproblems, eigenvalue and singular value problems, as well as performing a number of related computationaltasks. All routines are available in both single precision and double precision.

NOTE. ScaLAPACK routines are provided with Intel® Cluster MKL product only which is asuperset of Intel MKL.

Sections in this chapter include descriptions of ScaLAPACK computational routines that perform distinctcomputational tasks, as well as driver routines for solving standard types of problems in one call.

Generally, ScaLAPACK runs on a network of computers using MPI as a message-passing layer and a setof prebuilt communication subprograms (BLACS), as well as a set of BLAS optimized for the targetarchitecture. Intel® Cluster MKL version of ScaLAPACK is optimized for Intel® processors. For the detailedsystem and environment requirements see Intel MKL Release Notes and Intel MKL User's Guide.

For full reference on ScaLAPACK routines and related information see [SLUG].

OverviewThe model of the computing environment for ScaLAPACK is represented as a one-dimensional arrayof processes (for operations on band or tridiagonal matrices) or also a two-dimensional process grid(for operations on dense matrices). To use ScaLAPACK, all global matrices or vectors should bedistributed on this array or grid prior to calling the ScaLAPACK routines.

ScaLAPACK uses the two-dimensional block-cyclic data distribution as a layout for dense matrixcomputations. This distribution provides good work balance between available processors, as wellas gives the opportunity to use BLAS Level 3 routines for optimal local computations. Informationabout the data distribution that is required to establish the mapping between each global array andits corresponding process and memory location is contained in the so called array descriptorassociated with each global array. An example of an array descriptor structure is given in Table 6-1.

Table 6-1 Content of the array descriptor for dense matrices

DefinitionNameArray Element #

Descriptor type ( =1 for dense matrices)dtype1

BLACS context handle for the process gridctxt2

1589

DefinitionNameArray Element #

Number of rows in the global arraym3

Number of columns in the global arrayn4

Row blocking factormb5

Column blocking factornb6

Process row over which the first row of the global array isdistributed

rsrc7

Process column over which the first column of the global arrayis distributed

csrc8

Leading dimension of the local arraylld9

The number of rows and columns of a global dense matrix that a particular process in a gridreceives after data distributing is denoted by LOCr() and LOCc(), respectively. To computethese numbers, you can use the ScaLAPACK tool routine numroc.

After the block-cyclic distribution of global data is done, you may choose to perform an operationon a submatrix of the global matrix A, which is contained in the global subarray sub(A), definedby the following 6 values (for dense matrices):

The number of rows of sub(A)m

The number of columns of sub(A)n

A pointer to the local array containing the entire global array Aa

The row index of sub(A) in the global arrayia

The column index of sub(A) in the global arrayja

The array descriptor for the global arraydesca

Routine Naming ConventionsFor each routine introduced in this chapter, you can use the ScaLAPACK name. The namingconvention for ScaLAPACK routines is similar to that used for LAPACK routines (see RoutineNaming Conventions in Chapter 4). A general rule is that each routine name in ScaLAPACK,which has an LAPACK equivalent, is simply the LAPACK name prefixed by initial letter p.

ScaLAPACK names have the structure p?yyzzz or p?yyzz, which is described below.

The initial letter p is a distinctive prefix of ScaLAPACK routines and is present in each suchroutine.

The second symbol ? indicates the data type:



1590




The second and third letters yy indicate the matrix type as:

generalge

general bandgb

a pair of general matrices (for a generalized problem)gg

general tridiagonal (diagonally dominant-like)dt

general band (diagonally dominant-like)db

symmetric or Hermitian positive-definitepo

symmetric or Hermitian positive-definite bandpb

symmetric or Hermitian positive-definite tridiagonalpt

symmetricsy

symmetric tridiagonal (real)st

Hermitianhe

orthogonalor

triangular (or quasi-triangular)tr

trapezoidaltz

unitaryun

For computational routines, the last three letters zzz indicate the computation performed andhave the same meaning as for LAPACK routines.

For driver routines, the last two letters zz or three letters zzz have the following meaning:

a simple driver for solving a linear systemsv

an expert driver for solving a linear systemsvx

a driver for solving a linear least squares problemls

a simple driver for solving a symmetric eigenvalue problemev

an expert driver for solving a symmetric eigenvalue problemevx

a driver for computing a singular value decompositionsvd

an expert driver for solving a generalized symmetric definite eigenvalueproblem

gvx

Simple driver here means that the driver just solves the general problem, whereas an expertdriver is more versatile and can also optionally perform some related computations (such, forexample, as refining the solution and computing error bounds after the linear system is solved).

1591

ScaLAPACK Routines 6

Computational RoutinesIn the sections that follow, the descriptions of ScaLAPACK computational routines are given.These routines perform distinct computational tasks that can be used for:

• Solving Systems of Linear Equations

• Orthogonal Factorizations and LLS Problems

• Symmetric Eigenproblems

• Nonsymmetric Eigenproblems

• Singular Value Decomposition

• Generalized Symmetric-Definite Eigenproblems

See also the respective driver routines.

Linear Equations

ScaLAPACK supports routines for the systems of equations with the following types of matrices:

• general

• general banded

• general diagonally dominant-like banded (including general tridiagonal)

• symmetric or Hermitian positive-definite

• symmetric or Hermitian positive-definite banded

• symmetric or Hermitian positive-definite tridiagonal

A diagonally dominant-like matrix is defined as a matrix for which it is known in advancethat pivoting is not required in the LU factorization of this matrix.

For the above matrix types, the library includes routines for performing the followingcomputations: factoring the matrix; equilibrating the matrix; solving a system of linearequations; estimating the condition number of a matrix; refining the solution of linear equationsand computing its error bounds; inverting the matrix. Note that for some of the listed matrixtypes only part of the computational routines are provided (for example, routines that refinethe solution are not provided for band or tridiagonal matrices). See Table 6-2 for full list ofavailable routines.

To solve a particular problem, you can either call two or more computational routines or call acorresponding driver routine that combines several tasks in one call. Thus, to solve a systemof linear equations with a general matrix, you can first call p?getrf(LU factorization) and then

1592


p?getrs(computing the solution). Then, you might wish to call p?gerfs to refine the solutionand get the error bounds. Alternatively, you can just use the driver routine p?gesvx whichperforms all these tasks in one call.

Table 6-2 lists the ScaLAPACK computational routines for factorizing, equilibrating, and invertingmatrices, estimating their condition numbers, solving systems of equations with real matrices,refining the solution, and estimating its error.

Table 6-2 Computational Routines for Systems of Linear Equations

Invertmatrix

Estimateerror

Conditionnumber

Solvesystem

Equilibratematrix

Factorizematrix

Matrix type, storagescheme

p?getrip?gerfsp?geconp?getrsp?geequp?getrfgeneral (partialpivoting)

p?gbtrsp?gbtrfgeneral band (partialpivoting)

p?dbtrsp?dbtrfgeneral band (nopivoting)

p?dttrsp?dttrfgeneral tridiagonal (nopivoting)

p?potrip?porfsp?poconp?potrsp?poequp?potrfsymmetric/Hermitianpositive-definite

p?pbtrsp?pbtrfsymmetric/Hermitianpositive-definite, band

p?pttrsp?pttrfsymmetric/Hermitianpositive-definite,tridiagonal

p?trtrip?trrfsp?trconp?trtrstriangular

In this table ? stands for s (single precision real), d (double precision real), c (single precisioncomplex), or z (double precision complex).

Routines for Matrix Factorization

This section describes the ScaLAPACK routines for matrix factorization. The followingfactorizations are supported:

• LU factorization of general matrices

• LU factorization of diagonally dominant-like matrices

• Cholesky factorization of real symmetric or complex Hermitian positive-definite matrices

You can compute the factorizations using full and band storage of matrices.

1593


p?getrfComputes the LU factorization of a general m-by-ndistributed matrix.

Syntax

call psgetrf(m, n, a, ia, ja, desca, ipiv, info)

call pdgetrf(m, n, a, ia, ja, desca, ipiv, info)

call pcgetrf(m, n, a, ia, ja, desca, ipiv, info)

call pzgetrf(m, n, a, ia, ja, desca, ipiv, info)

Description

The routine forms the LU factorization of a general m-by-n distributed matrix sub(A) =A(ia:ia+n-1, ja:ja+n-1) as

A = P*L*U

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lowertrapezoidal if m > n) and U is upper triangular (upper trapezoidal if m < n). L and U are storedin sub(A).

The routine uses partial pivoting, with row interchanges.

Input Parameters

(global) INTEGER. The number of rows in the distributed

submatrix sub(A); m≥0.

m

(global) INTEGER. The number of columns in the distributed

submatrix sub(A); n≥0.

n

(local)aREAL for psgetrfDOUBLE PRECISION for pdgetrfCOMPLEX for pcgetrfDOUBLE COMPLEX for pzgetrf.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)).Contains the local pieces of the distributed matrix sub(A)to be factored.

1594

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column of thesubmatrix A(ia:ia+n-1, ja:ja+n-1), respectively.

ia, ja

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A.

desca

Output Parameters

Overwritten by local pieces of the factors L and U from thefactorization A = P*L*U. The unit diagonal elements of L arenot stored.

a

(local) INTEGER array.ipivThe dimension of ipiv is (LOCr(m_a)+ mb_a).This array contains the pivoting information: local row iwas interchanged with global row ipiv(i). This array istied to the distributed matrix A.

(global) INTEGER.infoIf info=0, the execution is successful.info < 0: if the i-th argument is an array and the j-thentry had an illegal value, then info = -(i*100+j); if thei-th argument is a scalar and had an illegal value, theninfo = -i.If info = i, uii is 0. The factorization has been completed,but the factor U is exactly singular. Division by zero willoccur if you use the factor U for solving a system of linearequations.

1595

p?gbtrfComputes the LU factorization of a general n-by-nbanded distributed matrix.

Syntax

call psgbtrf(n, bwl, bwu, a, ja, desca, ipiv, af, laf, work, lwork, info)

call pdgbtrf(n, bwl, bwu, a, ja, desca, ipiv, af, laf, work, lwork, info)

call pcgbtrf(n, bwl, bwu, a, ja, desca, ipiv, af, laf, work, lwork, info)

call pzgbtrf(n, bwl, bwu, a, ja, desca, ipiv, af, laf, work, lwork, info)

Description

The routine computes the LU factorization of a general n-by-n real/complex banded distributedmatrix A(1:n, ja:ja+n-1) using partial pivoting with row interchanges.

The resulting factorization is not the same factorization as returned from the LAPACK routine?gbtrf. Additional permutations are performed on the matrix for the sake of parallelism.


A(1:n, ja:ja+n-1) = P*L*U*Q

where P and Q are permutation matrices, and L and U are banded lower and upper triangularmatrices, respectively. The matrix Q represents reordering of columns for the sake of parallelism,while P represents reordering of rows for numerical stability using classic partial pivoting.

Input Parameters

(global) INTEGER. The number of rows and columns in the

distributed submatrix A(1:n, ja:ja+n-1); n ≥ 0.

n

(global) INTEGER. The number of sub-diagonals within theband of A

bwl

( 0 ≤ bwl ≤ n-1 ).

(global) INTEGER. The number of super-diagonals withinthe band of A

bwu

( 0 ≤ bwu ≤ n-1 ).

(local)aREAL for psgbtrf

1596


DOUBLE PRECISION for pdgbtrfCOMPLEX for pcgbtrfDOUBLE COMPLEX for pzgbtrf.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1) where

lld_a ≥ 2*bwl + 2*bwu +1.Contains the local pieces of the n-by-n distributed bandedmatrix A(1:n, ja:ja+n-1) to be factored.

(global) INTEGER. The index in the global array A that pointsto the start of the matrix to be operated on ( which may beeither all of A or a submatrix of A).

ja


desca

If desca(dtype_) = 501, then dlen_ ≥ 7;

else if desca(dtype_) = 1, then dlen_ ≥ 9.

(local) INTEGER. The dimension of the array af.laf

Must be laf ≥(NB+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).

(local) Same type as a. Workspace array of dimension lwork.

work

(local or global) INTEGER. The size of the work array (lwork

≥ 1). If lwork is too small, the minimal acceptable size willbe returned in work(1) and an error code is returned.

lwork

Output Parameters

On exit, this array contains details of the factorization. Notethat additional permutations are performed on the matrix,so that the factors returned are different from those returnedby LAPACK.

a

(local) INTEGER array.ipiv

The dimension of ipiv must be ≥ desca(NB).

1597


Contains pivot indices for local factorizations. Note that youshould not alter the contents of this array betweenfactorization and solve.

(local)afREAL for psgbtrfDOUBLE PRECISION for pdgbtrfCOMPLEX for pcgbtrfDOUBLE COMPLEX for pzgbtrf.Array, dimension (laf).Auxiliary Fillin space. Fillin is created during the factorizationroutine p?gbtrf and this is stored in af.Note that if a linear system is to be solved using p?gbtrsafter the factorization routine, af must not be altered afterthe factorization.

On exit, work(1) contains the minimum value of lworkrequired for optimum performance.

work(1)

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i.info > 0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not nonsingular, and thefactorization was not completed. If info = k > NPROCS,the submatrix stored on processor info-NPROCSrepresenting interactions with other processors was notnonsingular, and the factorization was not completed.

1598

p?dbtrfComputes the LU factorization of a n-by-ndiagonally dominant-like banded distributed matrix.

Syntax

call psdbtrf(n, bwl, bwu, a, ja, desca, af, laf, work, lwork, info)

call pddbtrf(n, bwl, bwu, a, ja, desca, af, laf, work, lwork, info)

call pcdbtrf(n, bwl, bwu, a, ja, desca, af, laf, work, lwork, info)

call pzdbtrf(n, bwl, bwu, a, ja, desca, af, laf, work, lwork, info)

Description

The routine computes the LU factorization of a n-by-n real/complex diagonally dominant-likebanded distributed matrix A(1:n, ja:ja+n-1) without pivoting.

Note that the resulting factorization is not the same factorization as returned from LAPACK.Additional permutations are performed on the matrix for the sake of parallelism.

Input Parameters

(global) INTEGER. The number of rows and columns in the

distributed submatrix A(1:n, ja:ja+n-1); n ≥ 0.

n

(global) INTEGER. The number of sub-diagonals within theband of A

bwl

(0 ≤ bwl ≤ n-1).

(global) INTEGER. The number of super-diagonals withinthe band of A

bwu

(0 ≤ bwu ≤ n-1).

(local)aREAL for psdbtrfDOUBLE PRECISION for pddbtrfCOMPLEX for pcdbtrfDOUBLE COMPLEX for pzdbtrf.Pointer into the local memory to an array of local dimension(lld_a,LOCc(ja+n-1)).

1599


Contains the local pieces of the n-by-n distributed bandedmatrix A(1:n, ja:ja+n-1) to be factored.


ja


desca




Must be laf ≥ NB*(bwl+bwu)+*(max(bwl,bwu))2 .If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).


work

(local or global) INTEGER. The size of the work array, must

be lwork ≥ (max(bwl,bwu))2. If lwork is too small, theminimal acceptable size will be returned in work(1) and anerror code is returned.

lwork

Output Parameters

On exit, this array contains details of the factorization. Notethat additional permutations are performed on the matrix,so that the factors returned are different from those returnedby LAPACK.

a

(local)afREAL for psdbtrfDOUBLE PRECISION for pddbtrfCOMPLEX for pcdbtrfDOUBLE COMPLEX for pzdbtrf.Array, dimension (laf).Auxiliary Fillin space. Fillin is created during the factorizationroutine p?dbtrf and this is stored in af.

1600


Note that if a linear system is to be solved using p?dbtrsafter the factorization routine, af must not be altered afterthe factorization.


work(1)

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the ith argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i. info > 0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not diagonally dominant-like,and the factorization was not completed. If info = k >NPROCS, the submatrix stored on processor info-NPROCSrepresenting interactions with other processors was notnonsingular, and the factorization was not completed.

p?potrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite distributedmatrix.

Syntax

call pspotrf(uplo, n, a, ia, ja, desca, info)

call pdpotrf(uplo, n, a, ia, ja, desca, info)

call pcpotrf(uplo, n, a, ia, ja, desca, info)

call pzpotrf(uplo, n, a, ia, ja, desca, info)

Description

This routine computes the Cholesky factorization of a real symmetric or complex Hermitianpositive-definite distributed n-by-n matrix A(ia:ia+n-1, ja:ja+n-1), denoted below assub(A).

1601


sub(A) = UH*U if uplo='U', or

sub(A) = L*LH if uplo='L'


Input Parameters

(global) CHARACTER*1.uploIndicates whether the upper or lower triangular part ofsub(A) is stored. Must be 'U' or 'L'.If uplo = 'U', the array a stores the upper triangular partof the matrix sub(A) that is factored as UH*U.If uplo = 'L', the array a stores the lower triangular partof the matrix sub(A) that is factored as L*LH.

(global) INTEGER. The order of the distributed submatrix

sub(A) (n≥0).

n

(local)aREAL for pspotrfDOUBLE PRECISON for pdpotrfCOMPLEX for pcpotrfDOUBLE COMPLEX for pzpotrf.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the n-by-nsymmetric/Hermitian distributed matrix sub(A) to befactored.Depending on uplo, the array a contains either the upperor the lower triangular part of the matrix sub(A) (see uplo).

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column of thesubmatrix sub(A), respectively.

ia, ja


desca

1602


Output Parameters

The upper or lower triangular part of a is overwritten by theCholesky factor U or L, as specified by uplo.

a

(global) INTEGER.infoIf info=0, the execution is successful;info < 0: if the i-th argument is an array, and the j-thentry had an illegal value, then info = -(i*100+j); if thei-th argument is a scalar and had an illegal value, theninfo = -i.If info = k >0, the leading minor of order k,A(ia:ia+k-1, ja:ja+k-1), is not positive-definite, andthe factorization could not be completed.

p?pbtrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite bandeddistributed matrix.

Syntax

call pspbtrf(uplo, n, bw, a, ja, desca, af, laf, work, lwork, info)

call pdpbtrf(uplo, n, bw, a, ja, desca, af, laf, work, lwork, info)

call pcpbtrf(uplo, n, bw, a, ja, desca, af, laf, work, lwork, info)

call pzpbtrf(uplo, n, bw, a, ja, desca, af, laf, work, lwork, info)

Description

This routine computes the Cholesky factorization of an n-by-n real symmetric or complexHermitian positive-definite banded distributed matrix A(1:n, ja:ja+n-1).

The resulting factorization is not the same factorization as returned from LAPACK. Additionalpermutations are performed on the matrix for the sake of parallelism.

The factorization has the form:

A(1:n, ja:ja+n-1) = P*UH*PT, if uplo='U', or

A(1:n, ja:ja+n-1) = P*L*LH*PT, if uplo='L',

1603

where P is a permutation matrix and U and L are banded upper and lower triangular matrices,respectively.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) isstored;If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) isstored.

(global) INTEGER. The order of the distributed submatrixA(1:n, ja:ja+n-1).

n

(n≥0).

(global) INTEGER.bwThe number of superdiagonals of the distributed matrix ifuplo = 'U', or the number of subdiagonals if uplo = 'U'

(bw≥0).

(local)aREAL for pspbtrfDOUBLE PRECISON for pdpbtrfCOMPLEX for pcpbtrfDOUBLE COMPLEX for pzpbtrf.Pointer into the local memory to an array of dimension(lld_a,LOCc(ja+n-1)).On entry, this array contains the local pieces of the upperor lower triangle of the symmetric/Hermitian banddistributed matrix A(1:n, ja:ja+n-1) to be factored.


ja


desca




Must be laf ≥ (NB+2*bw)*bw.

1604


If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).


work


be lwork ≥ bw2.

lwork

Output Parameters

On exit, if info=0, contains the permuted triangular factorU or L from the Cholesky factorization of the band matrixA(1:n, ja:ja+n-1), as specified by uplo.

a

(local)afREAL for pspbtrfDOUBLE PRECISON for pdpbtrfCOMPLEX for pcpbtrfDOUBLE COMPLEX for pzpbtrf.Array, dimension (laf). Auxiliary Fillin space. Fillin is createdduring the factorization routine p?pbtrf and this is storedin af. Note that if a linear system is to be solved usingp?pbtrs after the factorization routine, af must not bealtered.


work(1)

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i.info>0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not positive definite, and thefactorization was not completed.

1605

If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not nonsingular, and the factorization wasnot completed.

p?pttrfComputes the Cholesky factorization of asymmetric (Hermitian) positive-definite tridiagonaldistributed matrix.

Syntax

call pspttrf(n, d, e, ja, desca, af, laf, work, lwork, info)

call pdpttrf(n, d, e, ja, desca, af, laf, work, lwork, info)

call pcpttrf(n, d, e, ja, desca, af, laf, work, lwork, info)

call pzpttrf(n, d, e, ja, desca, af, laf, work, lwork, info)

Description

This routine computes the Cholesky factorization of an n-by-n real symmetric or complexhermitian positive-definite tridiagonal distributed matrix A(1:n, ja:ja+n-1).



A(1:n, ja:ja+n-1) = P*L*D*LH*PT, or

A(1:n, ja:ja+n-1) = P*UH*D*U*PT,

where P is a permutation matrix, and U and L are tridiagonal upper and lower triangular matrices,respectively.

Input Parameters

(global) INTEGER. The order of the distributed submatrixA(1:n, ja:ja+n-1)

n

(n ≥ 0).

(local)d, e

1606


REAL for pspttrfDOUBLE PRECISON for pdpttrfCOMPLEX for pcpttrfDOUBLE COMPLEX for pzpttrf.Pointers into the local memory to arrays of dimension(desca(nb_)) each.On entry, the array d contains the local part of the globalvector storing the main diagonal of the distributed matrixA.On entry, the array e contains the local part of the globalvector storing the upper diagonal of the distributed matrixA.

(global) INTEGER. The index in the global array A that pointsto the start of the matrix to be operated on (which may beeither all of A or a submatrix of A).

ja


desca




Must be laf ≥ NB+2.If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).

(local) Same type as d and e. Workspace array of dimensionlwork .

work

(local or global) INTEGER. The size of the work array, mustbe at least

lwork

lwork ≥ 8*NPCOL.

Output Parameters

On exit, overwritten by the details of the factorization.d, e

(local)afREAL for pspttrfDOUBLE PRECISION for pdpttrfCOMPLEX for pcpttrf

1607


DOUBLE COMPLEX for pzpttrf.Array, dimension (laf).Auxiliary Fillin space. Fillin is created during the factorizationroutine p?pttrf and this is stored in af.Note that if a linear system is to be solved using p?pttrsafter the factorization routine, af must not be altered.


work(1)

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not positive definite, and thefactorization was not completed.If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not nonsingular, and the factorization wasnot completed.

p?dttrfComputes the LU factorization of a diagonallydominant-like tridiagonal distributed matrix.

Syntax

call psdttrf(n, dl, d, du, ja, desca, af, laf, work, lwork, info)

call pddttrf(n, dl, d, du, ja, desca, af, laf, work, lwork, info)

call pcdttrf(n, dl, d, du, ja, desca, af, laf, work, lwork, info)

call pzdttrf(n, dl, d, du, ja, desca, af, laf, work, lwork, info)

1608

Description

This routine computes the LU factorization of an n-by-n real/complex diagonally dominant-liketridiagonal distributed matrix A(1:n, ja:ja+n-1) without pivoting for stability.



A(1:n, ja:ja+n-1) = P*L*U*PT,

where P is a permutation matrix, and L and U are banded lower and upper triangular matrices,respectively.

Input Parameters

(global) INTEGER. The number of rows and columns to beoperated on, that is, the order of the distributed submatrix

A(1:n, ja:ja+n-1) (n ≥ 0).

n

(local)dl, d, duREAL for pspttrfDOUBLE PRECISON for pdpttrfCOMPLEX for pcpttrfDOUBLE COMPLEX for pzpttrf.Pointers to the local arrays of dimension (desca(nb_))each.On entry, the array dl contains the local part of the globalvector storing the subdiagonal elements of the matrix.Globally, dl(1) is not referenced, and dl must be alignedwith d.On entry, the array d contains the local part of the globalvector storing the diagonal elements of the matrix.On entry, the array du contains the local part of the globalvector storing the super-diagonal elements of the matrix.du(n) is not referenced, and du must be aligned with d.


ja

1609


(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A. If

desca(dtype_) = 501, then dlen_ ≥ 7;

desca



Must be laf ≥ 2*(NB+2) .If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).

(local) Same type as d. Workspace array of dimension lwork.

work


be at least lwork ≥ 8*NPCOL.

lwork

Output Parameters

On exit, overwritten by the information containing thefactors of the matrix.

dl, d, du

(local)afREAL for psdttrfDOUBLE PRECISION for pddttrfCOMPLEX for pcdttrfDOUBLE COMPLEX for pzdttrf.Array, dimension (laf).Auxiliary Fillin space. Fillin is created during the factorizationroutine p?dttrf and this is stored in af.Note that if a linear system is to be solved using p?dttrsafter the factorization routine, af must not be altered.


work(1)

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the ith argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

1610

info > 0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not diagonally dominant-like,and the factorization was not completed. If info = k >NPROCS, the submatrix stored on processor info-NPROCSrepresenting interactions with other processors was notnonsingular, and the factorization was not completed.

Routines for Solving Systems of Linear Equations

This section describes the ScaLAPACK routines for solving systems of linear equations. Beforecalling most of these routines, you need to factorize the matrix of your system of equations(see Routines for Matrix Factorization in this chapter). However, the factorization is not necessaryif your system of equations has a triangular matrix.

p?getrsSolves a system of distributed linear equations witha general square matrix, using the LU factorizationcomputed by p?getrf.

Syntax

call psgetrs(trans, n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pdgetrs(trans, n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pcgetrs(trans, n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pzgetrs(trans, n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

Description

This routine solves a system of distributed linear equations with a general n-by-n distributedmatrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the LU factorization computed by p?getrf.

The system has one of the following forms specified by trans:

sub(A)*X = sub(B) (no transpose),

sub(A)T*X = sub(B) (transpose),

sub(A)H*X = sub(B) (conjugate transpose),

1611


where sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1).

Before calling this routine, you must call p?getrf to compute the LU factorization of sub(A).

Input Parameters

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then sub(A)*X = sub(B) is solved for X.If trans = 'T', then sub(A)T*X = sub(B) is solved forX.If trans = 'C', then sub(A)H *X = sub(B) is solved forX.

(global) INTEGER. The number of linear equations; the order

of the submatrix sub(A) (n≥0).

n

(global) INTEGER. The number of right hand sides; thenumber of columns of the distributed submatrix sub(B)

(nrhs≥0).

nrhs

(global)a, bREAL for psgetrsDOUBLE PRECISION for pdgetrsCOMPLEX for pcgetrsDOUBLE COMPLEX for pzgetrs.Pointers into the local memory to arrays of local dimensiona(lld_a, LOCc(ja+n-1)) and b(lld_b,LOCc(jb+nrhs-1)), respectively.On entry, the array a contains the local pieces of the factorsL and U from the factorization sub(A) = P*L*U; the unitdiagonal elements of L are not stored. On entry, the arrayb contains the right hand sides sub(B).


ia, ja


desca


1612


The dimension of ipiv is (LOCr(m_a) + mb_a). This arraycontains the pivoting information: local row i of the matrixwas interchanged with the global row ipiv(i).This array is tied to the distributed matrix A.

(global) INTEGER. The row and column indices in the globalarray B indicating the first row and the first column of thesubmatrix sub(B), respectively.

ib, jb

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix B.

descb

Output Parameters

On exit, overwritten by the solution distributed matrix X.b

INTEGER. If info=0, the execution is successful. info <0:

info

If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

p?gbtrsSolves a system of distributed linear equations witha general band matrix, using the LU factorizationcomputed by p?gbtrf.

Syntax

call psgbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, af,laf, work, lwork, info)

call pdgbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, af,laf, work, lwork, info)

call pcgbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, af,laf, work, lwork, info)

call pzgbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, af,laf, work, lwork, info)

1613

Description

This routine solves a system of distributed linear equations with a general band distributedmatrix sub(A) = A(1:n, ja:ja+n-1) using the LU factorization computed by p?gbtrf.

The system has one of the following forms specified by trans:

sub(A)*X = sub(B) (no transpose),

sub(A)T*X = sub(B) (transpose),

sub(A)H*X = sub(B) (conjugate transpose),

where sub(B) = B(ib:ib+n-1, 1:nrhs) .

Before calling this routine, you must call p?gbtrf to compute the LU factorization of sub(A).

Input Parameters

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then sub(A)*X = sub(B) is solved for X.If trans = 'T', then sub(A)T*X = sub(B) is solved forX.If trans = 'C', then sub(A)H *X = sub(B) is solved forX.


of the distributed submatrix sub(A) (n ≥ 0).

n

(global) INTEGER. The number of sub-diagonals within the

band of A ( 0 ≤ bwl ≤ n-1 ).

bwl

(global) INTEGER. The number of super-diagonals within

the band of A ( 0 ≤ bwu ≤ n-1 ).

bwu


(nrhs ≥ 0).

nrhs

(global)a, bREAL for psgbtrsDOUBLE PRECISION for pdgbtrsCOMPLEX for pcgbtrsDOUBLE COMPLEX for pzgbtrs.

1614


Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) and b(lld_b,LOCc(nrhs)),respectively.The array a contains details of the LU factorization of thedistributed band matrix A.On entry, the array b contains the local pieces of the righthand sides B(ib:ib+n-1, 1:nrhs).


ja


desca




ib


descb




Must be laf ≥ NB*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).


work


be at least lwork ≥ nrhs*(NB+2*bwl+4*bwu).

lwork

Output Parameters


The dimension of ipiv must be ≥ desca(NB).

1615


Contains pivot indices for local factorizations. Note that youshould not alter the contents of this array betweenfactorization and solve.

On exit, overwritten by the local pieces of the solutiondistributed matrix X.

b

(local)afREAL for psgbtrsDOUBLE PRECISION for pdgbtrsCOMPLEX for pcgbtrsDOUBLE COMPLEX for pzgbtrs.Array, dimension (laf).Auxiliary Fillin space. Fillin is created during the factorizationroutine p?gbtrf and this is stored in af.Note that if a linear system is to be solved using p?gbtrsafter the factorization routine, af must not be altered afterthe factorization.


work(1)

INTEGER. If info=0, the execution is successful.infoinfo < 0:If the i-th argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

p?potrsSolves a system of linear equations with aCholesky-factored symmetric/Hermitian distributedpositive-definite matrix.

Syntax

call pspotrs(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pdpotrs(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pcpotrs(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pzpotrs(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

1616

Description

The routine p?potrs solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,

where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitianpositive definite distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1,jb:jb+nrhs-1).

This routine uses Cholesky factorization

sub(A) = UH*U, or sub(A) = L*LH

computed by p?potrf.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', upper triangle of sub(A) is stored;If uplo = 'L', lower triangle of sub(A) is stored.


sub(A) (n≥0).

n


(nrhs≥0).

nrhs

(local)a, bREAL for pspotrsDOUBLE PRECISION for pdpotrsCOMPLEX for pcpotrsDOUBLE COMPLEX for pzpotrs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) andb(lld_b,LOCc(jb+nrhs-1)), respectively.The array a contains the factors L or U from the Choleskyfactorization sub(A) = L*LH or sub(A) = UH*U, as computedby p?potrf.On entry, the array b contains the local pieces of the righthand sides sub(B).

1617


ia, ja


desca


ib, jb


descb

Output Parameters

Overwritten by the local pieces of the solution matrix X.b

INTEGER. If info=0, the execution is successful.infoinfo < 0: if the i-th argument is an array and the j-thentry had an illegal value, then info = -(i*100+j); if thei-th argument is a scalar and had an illegal value, theninfo = -i.

p?pbtrsSolves a system of linear equations with aCholesky-factored symmetric/Hermitianpositive-definite band matrix.

Syntax

call pspbtrs( uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pdpbtrs( uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pcpbtrs( uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pzpbtrs( uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work,lwork, info)

1618

Description

The routine p?pbtrs solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,

where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitianpositive definite distributed band matrix, and sub(B) denotes the distributed matrixB(ib:ib+n-1, 1:nrhs).

This routine uses Cholesky factorization

sub(A) = P*UH*U*PT, or sub(A) = P*L*LH*PT

computed by p?pbtrf.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', upper triangle of sub(A) is stored;If uplo = 'L', lower triangle of sub(A) is stored.


sub(A) (n≥0).

n

(global) INTEGER. The number of superdiagonals of thedistributed matrix if uplo = 'U', or the number of

subdiagonals if uplo = 'L' (bw≥0).

bw


(nrhs≥0).

nrhs

(local)a, bREAL for pspbtrsDOUBLE PRECISION for pdpbtrsCOMPLEX for pcpbtrsDOUBLE COMPLEX for pzpbtrs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) and b(lld_b,LOCc(nrhs-1)),respectively.

1619


The array a contains the permuted triangular factor U or Lfrom the Cholesky factorization sub(A) = P*UH*U*PT, orsub(A) = P*L*LH*PT of the band matrix A, as returned byp?pbtrf.On entry, the array b contains the local pieces of then-by-nrhs right hand side distributed matrix sub(B).


ja


desca



(global) INTEGER. The row index in the global array Bindicating the first row of the submatrix sub(B).

ib


descb

If descb(dtype_) = 502, then dlen_ ≥ 7;

else if descb(dtype_) = 1, then dlen_ ≥ 9.

(local) Arrays, same type as a.af, workThe array af is of dimension (laf). It contains auxiliaryFillin space. Fillin is created during the factorization routinep?dbtrf and this is stored in af.The array work is a workspace array of dimension lwork.


Must be laf ≥ nrhs*bw.If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).

(local or global) INTEGER. The size of the array work, must

be at least lwork ≥ bw2.

lwork

Output Parameters

On exit, if info=0, this array contains the local pieces ofthe n-by-nrhs solution distributed matrix X.

b

1620


work(1)

INTEGER. If info=0, the execution is successful.infoinfo < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

p?pttrsSolves a system of linear equations with asymmetric (Hermitian) positive-definite tridiagonaldistributed matrix using the factorization computedby p?pttrf.

Syntax

call pspttrs(n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work, lwork,info)

call pdpttrs(n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work, lwork,info)

call pcpttrs(uplo, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pzpttrs(uplo, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work,lwork, info)

Description

The routine p?pttrs solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,

where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitianpositive definite tridiagonal distributed matrix, and sub(B) denotes the distributed matrixB(ib:ib+n-1, 1:nrhs).

This routine uses the factorization

sub(A) = P*L*D*LH*PT, or sub(A) = P*UH*D*U*PT

1621

computed by p?pttrf.

Input Parameters

(global, used in complex flavors only)uploCHARACTER*1. Must be 'U' or 'L'.If uplo = 'U', upper triangle of sub(A) is stored;If uplo = 'L', lower triangle of sub(A) is stored.


sub(A) (n≥0).

n


(nrhs≥0).

nrhs

(local)d, eREAL for pspttrsDOUBLE PRECISON for pdpttrsCOMPLEX for pcpttrsDOUBLE COMPLEX for pzpttrs.Pointers into the local memory to arrays of dimension(desca(nb_)) each.These arrays contain details of the factorization as returnedby p?pttrf


ja


desca

If desca(dtype_) = 501 or 502, then dlen_ ≥ 7;


(local) Same type as d, e.bPointer into the local memory to an array of local dimensionb(lld_b, LOCc(nrhs)).On entry, the array b contains the local pieces of then-by-nrhs right hand side distributed matrix sub(B).

1622


(global) INTEGER. The row index in the global array B thatpoints to the first row of the matrix to be operated on (whichmay be either all of B or a submatrix of B).

ib


descb



(local) REAL for pspttrsaf, workDOUBLE PRECISION for pdpttrsCOMPLEX for pcpttrsDOUBLE COMPLEX for pzpttrs.Arrays of dimension (laf) and (lwork), respectively Thearray af contains auxiliary Fillin space. Fillin is createdduring the factorization routine p?pttrf and this is storedin af.The array work is a workspace array.


Must be laf ≥ NB+2.If laf is not large enough, an error code is returned andthe minimum acceptable size will be returned in af(1).

(local or global) INTEGER. The size of the array work, mustbe at least

lwork

lwork ≥ (10+2*min(100,nrhs))*NPCOL+4*nrhs.

Output Parameters

On exit, this array contains the local pieces of the solutiondistributed matrix X.

b


work(1)


info

if the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i.

1623


p?dttrsSolves a system of linear equations with adiagonally dominant-like tridiagonal distributedmatrix using the factorization computed byp?dttrf.

Syntax

call psdttrs(trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pddttrs(trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pcdttrs(trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pzdttrs(trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af, laf,work, lwork, info)

Description

The routine p?dttrs solves for X one of the systems of equations:

sub(A)*X = sub(B),

(sub(A))T*X = sub(B), or

(sub(A))H*X = sub(B),

where sub(A) = A(1:n, ja:ja+n-1) is a diagonally dominant-like tridiagonal distributedmatrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This routine uses the LU factorization computed by p?dttrf.

Input Parameters

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then sub(A)*X = sub(B) is solved for X.If trans = 'T', then (sub(A))T*X = sub(B) is solvedfor X.If trans = 'C', then (sub(A))H*X = sub(B) is solvedfor X.

1624



sub(A) (n ≥ 0).

n


(nrhs ≥ 0).

nrhs

(local)dl, d, duREAL for psdttrsDOUBLE PRECISON for pddttrsCOMPLEX for pcdttrsDOUBLE COMPLEX for pzdttrs.Pointers to the local arrays of dimension (desca(nb_))each.On entry, these arrays contain details of the factorization.Globally, dl(1) and du(n) are not referenced; dl and dumust be aligned with d.


ja

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A. If

desca(dtype_) = 501 or 502, then dlen_ ≥ 7;

desca


(local) Same type as d.bPointer into the local memory to an array of local dimensionb(lld_b,LOCc(nrhs)).On entry, the array b contains the local pieces of then-by-nrhs right hand side distributed matrix sub(B).


ib


descb



1625


(local) REAL for psdttrsaf, workDOUBLE PRECISION for pddttrsCOMPLEX for pcdttrsDOUBLE COMPLEX for pzdttrs.Arrays of dimension (laf) and (lwork), respectively.The array af contains auxiliary Fillin space. Fillin is createdduring the factorization routine p?dttrf and this is storedin af. If a linear system is to be solved using p?dttrsafterthe factorization routine, af must not be altered.The array work is a workspace array.


Must be laf ≥ NB*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).



be at least lwork ≥ 10*NPCOL+4*nrhs.

lwork

Output Parameters


b


work(1)


info

if the ith argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

1626


p?dbtrsSolves a system of linear equations with adiagonally dominant-like banded distributed matrixusing the factorization computed by p?dbtrf.

Syntax

call psdbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pddbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pcdbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pzdbtrs(trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

Description

The routine p?dbtrs solves for X one of the systems of equations:

sub(A)*X = sub(B),



where sub(A) = A(1:n, ja:ja+n-1) is a diagonally dominant-like banded distributed matrix,and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This routine uses the LU factorization computed by p?dbtrf.

Input Parameters

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then sub(A)*X = sub(B) is solved for X.If trans = 'T', then (sub(A))T*X = sub(B) is solvedfor X.If trans = 'C', then (sub(A))H*X = sub(B) is solvedfor X.

1627



sub(A) (n ≥ 0).

n

(global) INTEGER. The number of subdiagonals within theband of A

bwl

( 0 ≤ bwl ≤ n-1 ).

(global) INTEGER. The number of superdiagonals within theband of A

bwu

( 0 ≤ bwu ≤ n-1 ).


(nrhs ≥ 0).

nrhs

(local)a, bREAL for psdbtrsDOUBLE PRECISON for pddbtrsCOMPLEX for pcdbtrsDOUBLE COMPLEX for pzdbtrs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) and b(lld_b,LOCc(nrhs)),respectively.On entry, the array a contains details of the LU factorizationof the band matrix A, as computed by p?dbtrf.On entry, the array b contains the local pieces of the righthand side distributed matrix sub(B).


ja


desca




ib

1628



descb



(local)af, workREAL for psdbtrsDOUBLE PRECISION for pddbtrsCOMPLEX for pcdbtrsDOUBLE COMPLEX for pzdbtrs.Arrays of dimension (laf) and (lwork), respectively Thearray af contains auxiliary Fillin space. Fillin is createdduring the factorization routine p?dbtrf and this is storedin af.The array work is a workspace array.


Must be laf ≥ NB*(bwl+bwu)+6*(max(bwl,bwu))2 .


(local or global) INTEGER. The size of the array work, mustbe at least

lwork

lwork ≥ (max(bwl,bwu))2.

Output Parameters


b


work(1)


info

if the ith argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

1629


p?trtrsSolves a system of linear equations with atriangular distributed matrix.

Syntax

call pstrtrs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,info)

call pdtrtrs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,info)

call pctrtrs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,info)

call pztrtrs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,info)

Description

This routine solves for X one of the following systems of linear equations:

sub(A)*X = sub(B),



where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular distributed matrix of order n,and sub(B) denotes the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).

A check is made to verify that sub(A) is nonsingular.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploIndicates whether sub(A) is upper or lower triangular:If uplo = 'U', then sub(A) is upper triangular.If uplo = 'L', then sub(A) is lower triangular.

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transIndicates the form of the equations:If trans = 'N', then sub(A)*X = sub(B) is solved for X.

1630


If trans = 'T', then sub(A)T*X = sub(B) is solved forX.If trans = 'C', then sub(A)H*X = sub(B) is solved forX.

(global) CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then sub(A) is not a unit triangular matrix.If diag = 'U', then sub(A) is unit triangular.


sub(A) (n≥0).

n

(global) INTEGER. The number of right-hand sides; i.e., thenumber of columns of the distributed matrix sub(B)

(nrhs≥0).

nrhs

(local)a, bREAL for pstrtrsDOUBLE PRECISION for pdtrtrsCOMPLEX for pctrtrsDOUBLE COMPLEX for pztrtrs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) andb(lld_b,LOCc(jb+nrhs-1)), respectively.The array a contains the local pieces of the distributedtriangular matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular matrix, and the strictlylower triangular part of sub(A) is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular matrix, and the strictlyupper triangular part of sub(A) is not referenced.If diag = 'U', the diagonal elements of sub(A) are alsonot referenced and are assumed to be 1.On entry, the array b contains the local pieces of the righthand side distributed matrix sub(B).


ia, ja

1631


desca


ib, jb


descb

Output Parameters

On exit, if info=0, sub(B) is overwritten by the solutionmatrix X.

b

INTEGER. If info=0, the execution is successful.infoinfo < 0:if the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i;info > 0:if info = i, the i-th diagonal element of sub(A) is zero,indicating that the submatrix is singular and the solutionsX have not been computed.

Routines for Estimating the Condition Number

This section describes the ScaLAPACK routines for estimating the condition number of a matrix.The condition number is used for analyzing the errors in the solution of a system of linearequations. Since the condition number may be arbitrarily large when the matrix is nearlysingular, the routines actually compute the reciprocal condition number.

1632

p?geconEstimates the reciprocal of the condition numberof a general distributed matrix in either the 1-normor the infinity-norm.

Syntax

call psgecon( norm, n, a, ia, ja, desca, anorm, rcond, work, lwork, iwork,liwork, info)

call pdgecon( norm, n, a, ia, ja, desca, anorm, rcond, work, lwork, iwork,liwork, info)

call pcgecon( norm, n, a, ia, ja, desca, anorm, rcond, work, lwork, rwork,lrwork, info)

call pzgecon( norm, n, a, ia, ja, desca, anorm, rcond, work, lwork, rwork,lrwork, info)

Description

This routine estimates the reciprocal of the condition number of a general distributedreal/complex matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) in either the 1-norm orinfinity-norm, using the LU factorization computed by p?getrf.

An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number iscomputed as

Input Parameters

(global) CHARACTER*1. Must be '1' or 'O' or 'I'.normSpecifies whether the 1-norm condition number or theinfinity-norm condition number is required.If norm = '1' or 'O', then the 1-norm is used;If norm = 'I', then the infinity-norm is used.

1633



sub(A) (n ≥ 0).

n

(local)aREAL for psgeconDOUBLE PRECISION for pdgeconCOMPLEX for pcgeconDOUBLE COMPLEX for pzgecon.Pointer into the local memory to an array of dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the local pieces of the factors L and Ufrom the factorization sub(A) = P*L*U; the unit diagonalelements of L are not stored.


ia, ja


desca

(global) REAL for single precision flavors, DOUBLEPRECISION for double precision flavors.

anorm

If norm = '1' or 'O', the 1-norm of the original distributedmatrix sub(A);If norm = 'I', the infinity-norm of the original distributedmatrix sub(A).

(local)workREAL for psgeconDOUBLE PRECISION for pdgeconCOMPLEX for pcgeconDOUBLE COMPLEX for pzgecon.The array work of dimension (lwork) is a workspace array.

(local or global) INTEGER. The dimension of the array work.lworkFor real flavors:lwork must be at least

1634


lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))+ max(2, max(nb_a*max(1,ceil(NPROW-1, NPCOL)),LOCc(n+mod(ja-1,nb_a))+nb_a*max(1, ceil(NPCOL-1,NPROW))).For complex flavors:lwork must be at least

lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+ max(2,max(nb_a*ceil(NPROW-1, NPCOL),LOCc(n+mod(ja-1,nb_a))+ nb_a*ceil(NPCOL-1,NPROW))).LOCr and LOCc values can be computed using the ScaLAPACKtool function numroc; NPROW and NPCOL can be determinedby calling the subroutine blacs_gridinfo.

(local) INTEGER. Workspace array, DIMENSION (liwork).Used in real flavors only.

iwork

(local or global) INTEGER. The dimension of the array iwork;used in real flavors only. Must be at least

liwork

liwork ≥ LOCr(n+mod(ia-1,mb_a)).

(local) REAL for pcgeconrworkDOUBLE PRECISION for pzgeconWorkspace array, DIMENSION (lrwork). Used in complexflavors only.

(local or global) INTEGER. The dimension of the array rwork;used in complex flavors only. Must be at least

lrwork

lrwork ≥ 2*LOCc(n+mod(ja-1,nb_a)).

Output Parameters

(global) REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.The reciprocal of the condition number of the distributedmatrix sub(A). See Description.


work(1)

1635


On exit, iwork(1) contains the minimum value of liworkrequired for optimum performance (for real flavors).

iwork(1)

On exit, rwork(1) contains the minimum value of lrworkrequired for optimum performance (for complex flavors).

rwork(1)

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

p?poconEstimates the reciprocal of the condition number(in the 1 - norm) of a symmetric / Hermitianpositive-definite distributed matrix.

Syntax

call pspocon( uplo, n, a, ia, ja, desca, anorm, rcond, work, lwork, iwork,liwork, info)

call pdpocon( uplo, n, a, ia, ja, desca, anorm, rcond, work, lwork, iwork,liwork, info)

call pcpocon( uplo, n, a, ia, ja, desca, anorm, rcond, work, lwork, rwork,lrwork, info)

call pzpocon( uplo, n, a, ia, ja, desca, anorm, rcond, work, lwork, rwork,lrwork, info)

Description

This routine estimates the reciprocal of the condition number (in the 1 - norm) of a realsymmetric or complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1,ja:ja+n-1), using the Cholesky factorization sub(A) = UH*U or sub(A) = L*LH computedby p?potrf.

An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number iscomputed as

1636

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the factor stored in sub(A) is upper orlower triangular.If uplo = 'U', sub(A) stores the upper triangular factor Uof the Cholesky factorization sub(A) = UH*U.If uplo = 'L', sub(A) stores the lower triangular factor Lof the Cholesky factorization sub(A) = L*LH.


sub(A) (n≥0).

n

(local)aREAL for pspoconDOUBLE PRECISION for pdpoconCOMPLEX for pcpoconDOUBLE COMPLEX for pzpocon.Pointer into the local memory to an array of dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the local pieces of the factors L or Ufrom the Cholesky factorization sub(A) = UH*U, or sub(A)= L*LH, as computed by p?potrf.


ia, ja


desca

(global) REAL for single precision flavors,anormDOUBLE PRECISION for double precision flavors.The 1-norm of the symmetric/Hermitian distributed matrixsub(A).

(local)work

1637


REAL for pspoconDOUBLE PRECISION for pdpoconCOMPLEX for pcpoconDOUBLE COMPLEX for pzpocon.The array work of dimension (lwork) is a workspace array.


lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))+ max(2,max(nb_a*ceil(NPROW-1, NPCOL),LOCc(n+mod(ja-1,nb_a))+nb_a*ceil(NPCOL-1, NPROW))).For complex flavors:lwork must be at least

lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+ max(2,max(nb_a*max(1,ceil(NPROW-1, NPCOL)),LOCc(n+mod(ja-1,nb_a))+ nb_a*max(1,ceil(NPCOL-1,NPROW)))).


iwork

(local or global) INTEGER. The dimension of the array iwork;

used in real flavors only. Must be at least liwork ≥LOCr(n+mod(ia-1,mb_a)).

liwork

(local) REAL for pcpoconrworkDOUBLE PRECISION for pzpoconWorkspace array, DIMENSION (lrwork). Used in complexflavors only.

(local or global) INTEGER. The dimension of the array rwork;

used in complex flavors only. Must be at least lrwork ≥2*LOCc(n+mod(ja-1,nb_a)).

lrwork

Output Parameters

(global) REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.

1638


The reciprocal of the condition number of the distributedmatrix sub(A).


work(1)


iwork(1)


rwork(1)

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i.

p?trconEstimates the reciprocal of the condition numberof a triangular distributed matrix in either 1-normor infinity-norm.

Syntax

call pstrcon( norm, uplo, diag, n, a, ia, ja, desca, rcond, work, lwork,iwork, liwork, info)

call pdtrcon( norm, uplo, diag, n, a, ia, ja, desca, rcond, work, lwork,iwork, liwork, info)

call pctrcon( norm, uplo, diag, n, a, ia, ja, desca, rcond, work, lwork,rwork, lrwork, info)

call pztrcon( norm, uplo, diag, n, a, ia, ja, desca, rcond, work, lwork,rwork, lrwork, info)

Description

This routine estimates the reciprocal of the condition number of a triangular distributed matrixsub(A) = A(ia:ia+n-1, ja:ja+n-1), in either the 1-norm or the infinity-norm.

The norm of sub(A) is computed and an estimate is obtained for ||(sub(A))-1||, then thereciprocal of the condition number is computed as

1639

Input Parameters

(global) CHARACTER*1. Must be '1' or 'O' or 'I'.normSpecifies whether the 1-norm condition number or theinfinity-norm condition number is required.If norm = '1' or 'O', then the 1-norm is used;If norm = 'I', then the infinity-norm is used.

(global) CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', sub(A) is upper triangular. If uplo = 'L',sub(A) is lower triangular.

(global) CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', sub(A) is non-unit triangular. If diag ='U', sub(A) is unit triangular.


sub(A), (n≥0).

n

(local)aREAL for pstrconDOUBLE PRECISION for pdtrconCOMPLEX for pctrconDOUBLE COMPLEX for pztrcon.Pointer into the local memory to an array of dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the local pieces of the triangulardistributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofthis distributed matrix contains the upper triangular matrix,and its strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofthis distributed matrix contains the lower triangular matrix,and its strictly upper triangular part is not referenced.

1640


If diag = 'U', the diagonal elements of sub(A) are alsonot referenced and are assumed to be 1.


ia, ja


desca

(local)workREAL for pstrconDOUBLE PRECISION for pdtrconCOMPLEX for pctrconDOUBLE COMPLEX for pztrcon.The array work of dimension (lwork) is a workspace array.


lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+LOCc(n+mod(ja-1,nb_a))+ max(2,max(nb_a*max(1,ceil(NPROW-1, NPCOL)),LOCc(n+mod(ja-1,nb_a))+ nb_a*max(1,ceil(NPCOL-1,NPROW))).For complex flavors:lwork must be at least

lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))+ max(2,max(nb_a*ceil(NPROW-1, NPCOL),LOCc(n+mod(ja-1,nb_a))+ nb_a*ceil(NPCOL-1,NPROW))).


iwork


liwork

liwork ≥ LOCr(n+mod(ia-1,mb_a)).

(local) REAL for pcpoconrworkDOUBLE PRECISION for pzpoconWorkspace array, DIMENSION (lrwork). Used in complexflavors only.

1641


(local or global) INTEGER. The dimension of the array rwork;used in complex flavors only. Must be at least

lrwork

lrwork ≥ LOCc(n+mod(ja-1,nb_a)).

Output Parameters

(global) REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.The reciprocal of the condition number of the distributedmatrix sub(A).


work(1)


iwork(1)


rwork(1)


Refining the Solution and Estimating Its Error

This section describes the ScaLAPACK routines for refining the computed solution of a systemof linear equations and estimating the solution error. You can call these routines after factorizingthe matrix of the system of equations and computing the solution (see Routines for MatrixFactorization and Solving Systems of Linear Equations).

1642


p?gerfsImproves the computed solution to a system oflinear equations and provides error bounds andbackward error estimates for the solution.

Syntax

call psgerfs(trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, ipiv,b, ib, jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork,info)

call pdgerfs(trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, ipiv,b, ib, jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork,info)

call pcgerfs(trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, ipiv,b, ib, jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork,info)

call pzgerfs(trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, ipiv,b, ib, jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork,info)

Description

This routine improves the computed solution to one of the systems of linear equations

sub(A)*sub(X) = sub(B),

sub(A)T*sub(X) = sub(B), or

sub(A)T*sub(X) = sub(B) and provides error bounds and backward error estimates for thesolution.

Here sub(A) = A(ia:ia+n-1, ja:ja+n-1), sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1),and sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1).

Input Parameters

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transSpecifies the form of the system of equations:If trans = 'N', the system has the form sub(A)*sub(X)= sub(B) (No transpose);

1643


If trans = 'T', the system has the form sub(A)T*sub(X)= sub(B) (Transpose);If trans = 'C', the system has the form sub(A)H*sub(X)= sub(B) (Conjugate transpose).


sub(A) (n ≥ 0).

n

(global) INTEGER. The number of right-hand sides, i.e., thenumber of columns of the matrices sub(B) and sub(X) (nrhs

≥ 0).

nrhs

(local)a, af, b, xREAL for psgerfsDOUBLE PRECISION for pdgerfsCOMPLEX for pcgerfsDOUBLE COMPLEX for pzgerfs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)), af(lld_af,LOCc(jaf+n-1)),b(lld_b,LOCc(jb+nrhs-1)), andx(lld_x,LOCc(jx+nrhs-1)), respectively.The array a contains the local pieces of the distributedmatrix sub(A).The array af contains the local pieces of the distributedfactors of the matrix sub(A) = P*L*U as computed byp?getrf.The array b contains the local pieces of the distributedmatrix of right hand sides sub(B).On entry, the array x contains the local pieces of thedistributed solution matrix sub(X).


ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray AF indicating the first row and the first column of thesubmatrix sub(AF), respectively.

iaf, jaf

1644


(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix AF.

descaf


ib, jb


descb

(global) INTEGER. The row and column indices in the globalarray X indicating the first row and the first column of thesubmatrix sub(X), respectively.

ix, jx

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix X.

descx

(local) INTEGER.ipivArray, dimension LOCr(m_af + mb_af.This array contains pivoting information as computed byp?getrf. If ipiv(i)=j, then the local row i was swappedwith the global row j.This array is tied to the distributed matrix A.

(local)workREAL for psgerfsDOUBLE PRECISION for pdgerfsCOMPLEX for pcgerfsDOUBLE COMPLEX for pzgerfs.The array work of dimension (lwork) is a workspace array.


lwork ≥ 3*LOCr(n+mod(ia-1,mb_a))For complex flavors:lwork must be at least

lwork ≥ 2*LOCr(n+mod(ia-1,mb_a))


iwork


liwork

1645


liwork ≥ LOCr(n+mod(ib-1,mb_b)).

(local) REAL for pcgerfsrworkDOUBLE PRECISION for pzgerfsWorkspace array, DIMENSION (lrwork). Used in complexflavors only.


used in complex flavors only. Must be at least lrwork ≥LOCr(n+mod(ib-1,mb_b))).

lrwork

Output Parameters

On exit, contains the improved solution vectors.x

REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.Arrays, dimension LOCc(jb+nrhs-1) each.The array ferr contains the estimated forward error boundfor each solution vector of sub(X).If XTRUE is the true solution corresponding to sub(X), ferris an estimated upper bound for the magnitude of the largestelement in (sub(X) - XTRUE) divided by the magnitudeof the largest element in sub(X). The estimate is as reliableas the estimate for rcond, and is almost always a slightoverestimate of the true error.This array is tied to the distributed matrix X.The array berr contains the component-wise relativebackward error of each solution vector (that is, the smallestrelative change in any entry of sub(A) or sub(B) that makessub(X) an exact solution). This array is tied to the distributedmatrix X.


work(1)


iwork(1)


rwork(1)

(global) INTEGER. If info=0, the execution is successful.info

1646


info < 0:If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.

p?porfsImproves the computed solution to a system oflinear equations with symmetric/Hermitian positivedefinite distributed matrix and provides errorbounds and backward error estimates for thesolution.

Syntax

call psporfs(uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, b, ib,jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork, info)

call pdporfs(uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, b, ib,jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork, info)

call pcporfs(uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, b, ib,jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork, info)

call pzporfs(uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf, b, ib,jb, descb, x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork, info)

Description

The routine p?porfs improves the computed solution to the system of linear equations


where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a real symmetric or complex Hermitian positivedefinite distributed matrix and

sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1),

sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1)

are right-hand side and solution submatrices, respectively. This routine also provides errorbounds and backward error estimates for the solution.

1647

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix sub(A) is stored.If uplo = 'U', sub(A) is upper triangular. If uplo = 'L',sub(A) is lower triangular.

(global) INTEGER. The order of the distributed matrix sub(A)

(n≥0).

n

(global) INTEGER. The number of right-hand sides, i.e., thenumber of columns of the matrices sub(B) and sub(X)

(nrhs≥0).

nrhs

(local)a, af, b, xREAL for psporfsDOUBLE PRECISION for pdporfsCOMPLEX for pcporfsDOUBLE COMPLEX for pzporfs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)), af(lld_af,LOCc(ja+n-1)),b(lld_b,LOCc(jb+nrhs-1)), andx(lld_x,LOCc(jx+nrhs-1)), respectively.The array a contains the local pieces of the n-by-nsymmetric/Hermitian distributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the distributedmatrix, and its strictly upper triangular part is notreferenced.The array af contains the factors L or U from the Choleskyfactorization sub(A) = L*LH or sub(A) = UH*U, ascomputed by p?potrf.On entry, the array b contains the local pieces of thedistributed matrix of right hand sides sub(B).On entry, the array x contains the local pieces of the solutionvectors sub(X).

1648



ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray AF indicating the first row and the first column of thesubmatrix sub(AF), respectively.

iaf, jaf


descaf


ib, jb


descb


ix, jx


descx

(local)workREAL for psporfsDOUBLE PRECISION for pdporfsCOMPLEX for pcporfsDOUBLE COMPLEX for pzporfs.The array work of dimension (lwork) is a workspace array.

(local) INTEGER. The dimension of the array work.lworkFor real flavors:lwork must be at least

lwork ≥ 3*LOCr(n+mod(ia-1,mb_a))For complex flavors:lwork must be at least



iwork

1649



liwork


(local) REAL for pcporfsrworkDOUBLE PRECISION for pzporfsWorkspace array, DIMENSION (lrwork). Used in complexflavors only.



lrwork

Output Parameters

On exit, contains the improved solution vectors.x

REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.Arrays, dimension LOCc(jb+nrhs-1) each.The array ferr contains the estimated forward error boundfor each solution vector of sub(X).If XTRUE is the true solution corresponding to sub(X), ferris an estimated upper bound for the magnitude of the largestelement in (sub(X) - XTRUE)divided by the magnitude ofthe largest element in sub(X). The estimate is as reliableas the estimate for rcond, and is almost always a slightoverestimate of the true error.This array is tied to the distributed matrix X.The array berr contains the component-wise relativebackward error of each solution vector (that is, the smallestrelative change in any entry of sub(A) or sub(B) that makessub(X) an exact solution). This array is tied to the distributedmatrix X.


work(1)


iwork(1)


rwork(1)

1650



p?trrfsProvides error bounds and backward errorestimates for the solution to a system of linearequations with a distributed triangular coefficientmatrix.

Syntax

call pstrrfs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork, info)

call pdtrrfs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,x, ix, jx, descx, ferr, berr, work, lwork, iwork, liwork, info)

call pctrrfs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork, info)

call pztrrfs(uplo, trans, diag, n, nrhs, a, ia, ja, desca, b, ib, jb, descb,x, ix, jx, descx, ferr, berr, work, lwork, rwork, lrwork, info)

Description

The routine p?trrfs provides error bounds and backward error estimates for the solution toone of the systems of linear equations


sub(A)T*sub(X) = sub(B), or

sub(A)H*sub(X) = sub(B) ,

where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular matrix,

sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and

sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1).

1651


The solution matrix X must be computed by p?trtrs or some other means before enteringthis routine. The routine p?trrfs does not do iterative refinement because doing so cannotimprove the backward error.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', sub(A) is upper triangular. If uplo = 'L',sub(A) is lower triangular.

(global) CHARACTER*1. Must be 'N' or 'T' or 'C'.transSpecifies the form of the system of equations:If trans = 'N', the system has the form sub(A)*sub(X)= sub(B) (No transpose);If trans = 'T', the system has the form sub((A)T*sub(X)= sub(B) (Transpose);If trans = 'C', the system has the form sub(A)H*sub(X)= sub(B) (Conjugate transpose).

CHARACTER*1. Must be 'N' or 'U'.diagIf diag = 'N', then sub(A) is non-unit triangular.If diag = 'U', then sub(A) is unit triangular.


(n≥0).

n

(global) INTEGER. The number of right-hand sides, that is,the number of columns of the matrices sub(B) and sub(X)

(nrhs≥0).

nrhs

(local)a, b, xREAL for pstrrfsDOUBLE PRECISION for pdtrrfsCOMPLEX for pctrrfsDOUBLE COMPLEX for pztrrfs.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)), b(lld_b,LOCc(jb+nrhs-1)),and x(lld_x,LOCc(jx+nrhs-1)), respectively.The array a contains the local pieces of the original triangulardistributed matrix sub(A).

1652


If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the distributedmatrix, and its strictly upper triangular part is notreferenced.If diag = 'U', the diagonal elements of sub(A) are alsonot referenced and are assumed to be 1.On entry, the array b contains the local pieces of thedistributed matrix of right hand sides sub(B).On entry, the array x contains the local pieces of the solutionvectors sub(X).


ia, ja


desca


ib, jb


descb


ix, jx


descx

(local)workREAL for pstrrfsDOUBLE PRECISION for pdtrrfsCOMPLEX for pctrrfsDOUBLE COMPLEX for pztrrfs.The array work of dimension (lwork) is a workspace array.

(local) INTEGER. The dimension of the array work.lworkFor real flavors:

1653


lwork must be at least lwork ≥3*LOCr(n+mod(ia-1,mb_a))For complex flavors:lwork must be at least



iwork


liwork


(local) REAL for pctrrfsrworkDOUBLE PRECISION for pztrrfsWorkspace array, DIMENSION (lrwork). Used in complexflavors only.



lrwork

Output Parameters

REAL for single precision flavors.ferr, berrDOUBLE PRECISION for double precision flavors.Arrays, dimension LOCc(jb+nrhs-1) each.The array ferr contains the estimated forward error boundfor each solution vector of sub(X).If XTRUE is the true solution corresponding to sub(X), ferris an estimated upper bound for the magnitude of the largestelement in (sub(X) - XTRUE) divided by the magnitude ofthe largest element in sub(X). The estimate is as reliable asthe estimate for rcond, and is almost always a slightoverestimate of the true error.This array is tied to the distributed matrix X.

1654


The array berr contains the component-wise relativebackward error of each solution vector (that is, the smallestrelative change in any entry of sub(A) or sub(B) that makessub(X) an exact solution). This array is tied to the distributedmatrix X.


work(1)


iwork(1)


rwork(1)


Routines for Matrix Inversion

This sections describes ScaLAPACK routines that compute the inverse of a matrix based on thepreviously obtained factorization. Note that it is not recommended to solve a system of equationsAx = b by first computing A-1 and then forming the matrix-vector product x = A-1b. Call asolver routine instead (see Solving Systems of Linear Equations); this is more efficient andmore accurate.

p?getriComputes the inverse of a LU-factored distributedmatrix.

Syntax

call psgetri(n, a, ia, ja, desca, ipiv, work, lwork, iwork, liwork, info)

call pdgetri(n, a, ia, ja, desca, ipiv, work, lwork, iwork, liwork, info)

call pcgetri(n, a, ia, ja, desca, ipiv, work, lwork, iwork, liwork, info)

call pzgetri(n, a, ia, ja, desca, ipiv, work, lwork, iwork, liwork, info)

1655


Description

This routine computes the inverse of a general distributed matrix sub(A) = A(ia:ia+n-1,ja:ja+n-1) using the LU factorization computed by p?getrf. This method inverts U and thencomputes the inverse of sub(A) by solving the system

inv(sub(A))*L = inv(U)

for inv(sub(A).

Input Parameters


sub(A) (n≥0).

n

(local)aREAL for psgetriDOUBLE PRECISION for pdgetriCOMPLEX for pcgetriDOUBLE COMPLEX for pzgetri.Pointer into the local memory to an array of local dimensiona(lld_a,LOCc(ja+n-1)).On entry, the array a contains the local pieces of the L andU obtained by the factorization sub(A) = P*L*U computedby p?getrf.


ia, ja


desca

(local)workREAL for psgetriDOUBLE PRECISION for pdgetriCOMPLEX for pcgetriDOUBLE COMPLEX for pzgetri.The array work of dimension (lwork) is a workspace array.

(local) INTEGER. The dimension of the array work. lworkmust be at least

lwork

lwork≥LOCr(n+mod(ia-1,mb_a))*nb_a.

1656


The array work is used to keep at most an entire columnblock of sub(A).

(local) INTEGER. Workspace array used for physicallytransposing the pivots, DIMENSION (liwork).

iwork

(local or global) INTEGER. The dimension of the array iwork.liworkThe minimal value liwork of is determined by the followingcode:

if NPROW == NPCOL then

liwork = LOCc(n_a + mod(ja-1,nb_a))+ nb_a

else

liwork = LOCc(n_a + mod(ja-1,nb_a)) +

max(ceil(ceil(LOCr(m_a)/mb_a)/(lcm/NPROW)),nb_a)

end if

where lcm is the least common multiple of process rowsand columns (NPROW and NPCOL).

Output Parameters

(local) INTEGER.ipivArray, dimension (LOCr(m_a)+ mb_a).This array contains the pivoting information.If ipiv(i)=j, then the local row i was swapped with theglobal row j.This array is tied to the distributed matrix A.


work(1)

On exit, iwork(1) contains the minimum value of liworkrequired for optimum performance.

iwork(1)

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:

1657

If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = i, U(i,i) is exactly zero. The factorization hasbeen completed, but the factor U is exactly singular, anddivision by zero will occur if it is used to solve a system ofequations.

p?potriComputes the inverse of a symmetric/Hermitianpositive definite distributed matrix.

Syntax

call pspotri(uplo, n, a, ia, ja, desca, info)

call pdpotri(uplo, n, a, ia, ja, desca, info)

call pcpotri(uplo, n, a, ia, ja, desca, info)

call pzpotri(uplo, n, a, ia, ja, desca, info)

Description

This routine computes the inverse of a real symmetric or complex Hermitian positive definitedistributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the Cholesky factorizationsub(A) = UH*U or sub(A) = L*LH computed by p?potrf.

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix sub(A) is stored.If uplo = 'U', upper triangle of sub(A) is stored. If uplo= 'L', lower triangle of sub(A) is stored.


sub(A) (n≥0).

n

1658


(local)aREAL for pspotriDOUBLE PRECISION for pdpotriCOMPLEX for pcpotriDOUBLE COMPLEX for pzpotri.Pointer into the local memory to an array of local dimensiona(lld_a,LOCc(ja+n-1)).On entry, the array a contains the local pieces of thetriangular factor U or L from the Cholesky factorizationsub(A) = UH*U, or sub(A) = L*LH, as computed byp?potrf.


ia, ja


desca

Output Parameters

On exit, overwritten by the local pieces of the upper or lowertriangle of the (symmetric/Hermitian) inverse of sub(A).

a

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = i, the (i, i) element of the factor U or L is zero,and the inverse could not be computed.

1659

p?trtriComputes the inverse of a triangular distributedmatrix.

Syntax

call pstrtri(uplo, diag, n, a, ia, ja, desca, info)

call pdtrtri(uplo, diag, n, a, ia, ja, desca, info)

call pctrtri(uplo, diag, n, a, ia, ja, desca, info)

call pztrtri(uplo, diag, n, a, ia, ja, desca, info)

Description

This routine computes the inverse of a real or complex upper or lower triangular distributedmatrix sub(A) = A(ia:ia+n-1, ja:ja+n-1).

Input Parameters

(global) CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the distributed matrix sub(A) is upper orlower triangular.If uplo = 'U', sub(A) is upper triangular.If uplo = 'L', sub(A) is lower triangular.

CHARACTER*1. Must be 'N' or 'U'.diagSpecifies whether or not the distributed matrix sub(A) isunit triangular.If diag = 'N', then sub(A) is non-unit triangular.If diag = 'U', then sub(A) is unit triangular.


sub(A) (n≥0).

n

(local)aREAL for pstrtriDOUBLE PRECISION for pdtrtriCOMPLEX for pctrtriDOUBLE COMPLEX for pztrtri.

1660


Pointer into the local memory to an array of local dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the local pieces of the triangulardistributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular matrix to be inverted,and the strictly lower triangular part of sub(A) is notreferenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular matrix, and the strictlyupper triangular part of sub(A) is not referenced.


ia, ja


desca

Output Parameters

On exit, overwritten by the (triangular) inverse of theoriginal matrix.

a

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = k, A(ia+k-1, ja+k-1) is exactly zero. Thetriangular matrix sub(A) is singular and its inverse can notbe computed.

Routines for Matrix Equilibration

ScaLAPACK routines described in this section are used to compute scaling factors needed toequilibrate a matrix. Note that these routines do not actually scale the matrices.

1661

p?geequComputes row and column scaling factors intendedto equilibrate a general rectangular distributedmatrix and reduce its condition number.

Syntax

call psgeequ(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, info)

call pdgeequ(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, info)

call pcgeequ(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, info)

call pzgeequ(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, info)

Description

This routine computes row and column scalings intended to equilibrate an m-by-n distributedmatrix sub(A) = A(ia:ia+m-1, ja:ja+n-1) and reduce its condition number. The outputarray r returns the row scale factors and the array c the column scale factors. These factorsare chosen to try to make the largest element in each row and column of the matrix B withelements bij=r(i)*aij*c(j) have absolute value 1.

r(i) and c(j) are restricted to be between SMLNUM = smallest safe number and BIGNUM = largestsafe number. Use of these scaling factors is not guaranteed to reduce the condition number ofsub(A) but works well in practice.

The auxiliary function p?laqge uses scaling factors computed by p?geequ to scale a generalrectangular matrix.

Input Parameters

(global) INTEGER. The number of rows to be operated on,that is, the number of rows of the distributed submatrix

sub(A) (m ≥ 0).

m

(global) INTEGER. The number of columns to be operatedon, that is, the number of columns of the distributed

submatrix sub(A) (n ≥ 0).

n

(local)aREAL for psgeequDOUBLE PRECISION for pdgeequ

1662


COMPLEX for pcgeequDOUBLE COMPLEX for pzgeequ .Pointer into the local memory to an array of local dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the local pieces of the m-by-ndistributed matrix whose equilibration factors are to becomputed.


ia, ja


desca

Output Parameters

(local) REAL for single precision flavors;r, cDOUBLE PRECISION for double precision flavors.Arrays, dimension LOCr(m_a) and LOCc(n_a), respectively.If info = 0, or info > ia+m-1, the array r (ia:ia+m-1)contains the row scale factors for sub(A). r is aligned withthe distributed matrix A, and replicated across every processcolumn. r is tied to the distributed matrix A.If info = 0, the array c (ja:ja+n-1) contains the columnscale factors for sub(A). c is aligned with the distributedmatrix A, and replicated down every process row. c is tiedto the distributed matrix A.

(global) REAL for single precision flavors;rowcnd, colcndDOUBLE PRECISION for double precision flavors.If info = 0 or info > ia+m-1, rowcnd contains the ratio

of the smallest r(i) to the largest r(i) (ia ≤ i ≤

ia+m-1). If rowcnd ≥ 0.1 and amax is neither too largenor too small, it is not worth scaling by r (ia:ia+m-1).If info = 0, colcnd contains the ratio of the smallest c(j)

to the largest c(j) (ja ≤ j ≤ ja+n-1).

If colcnd ≥ 0.1, it is not worth scaling by c(ja:ja+n-1).

(global) REAL for single precision flavors;amax

1663


DOUBLE PRECISION for double precision flavors.Absolute value of the largest matrix element. If amax is veryclose to overflow or very close to underflow, the matrixshould be scaled.

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = i and

i≤ m, the ith row of the distributed matrixsub(A) is exactly zero;i > m, the (i-m)th column of the distributedmatrix sub(A) is exactly zero.

p?poequComputes row and column scaling factors intendedto equilibrate a symmetric (Hermitian) positivedefinite distributed matrix and reduce its conditionnumber.

Syntax

call pspoequ(n, a, ia, ja, desca, sr, sc, scond, amax, info)

call pdpoequ(n, a, ia, ja, desca, sr, sc, scond, amax, info)

call pcpoequ(n, a, ia, ja, desca, sr, sc, scond, amax, info)

call pzpoequ(n, a, ia, ja, desca, sr, sc, scond, amax, info)

Description

This routine computes row and column scalings intended to equilibrate a real symmetric orcomplex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1)and reduce its condition number (with respect to the two-norm). The output arrays sr and screturn the row and column scale factors

1664

These factors are chosen so that the scaled distributed matrix B with elementsbij=s(i)*aij*s(j) has ones on the diagonal.

This choice of sr and sc puts the condition number of B within a factor n of the smallest possiblecondition number over all possible diagonal scalings.

The auxiliary function p?laqsy uses scaling factors computed by p?geequ to scale a generalrectangular mtrix.

Input Parameters


sub(A) (n≥0).

n

(local)aREAL for pspoequDOUBLE PRECISION for pdpoequCOMPLEX for pcpoequDOUBLE COMPLEX for pzpoequ.Pointer into the local memory to an array of local dimensiona(lld_a,LOCc(ja+n-1)).The array a contains the n-by-n symmetric/Hermitianpositive definite distributed matrix sub(A) whose scalingfactors are to be computed. Only the diagonal elements ofsub(A) are referenced.


ia, ja


desca

Output Parameters

(local)sr, sc

1665


REAL for single precision flavors;DOUBLE PRECISION for double precision flavors.Arrays, dimension LOCr(m_a) and LOCc(n_a), respectively.If info = 0, the array sr(ia:ia+n-1) contains the rowscale factors for sub(A). sr is aligned with the distributedmatrix A, and replicated across every process column. sris tied to the distributed matrix A.If info = 0, the array sc (ja:ja+n-1) contains thecolumn scale factors for sub(A). sc is aligned with thedistributed matrix A, and replicated down every processrow. sc is tied to the distributed matrix A.

(global)scondREAL for single precision flavors;DOUBLE PRECISION for double precision flavors.If info = 0, scond contains the ratio of the smallest sr(i)( or sc(j)) to the largest sr(i) ( or sc(j)), with

ia≤i≤ia+n-1 and ja≤j≤ja+n-1.

If scond ≥ 0.1 and amax is neither too large nor too small,it is not worth scaling by sr ( or sc ).

(global)amaxREAL for single precision flavors;DOUBLE PRECISION for double precision flavors.Absolute value of the largest matrix element. If amax is veryclose to overflow or very close to underflow, the matrixshould be scaled.

(global) INTEGER.infoIf info=0, the execution is successful.info < 0:If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = k, the k-th diagonal entry of sub(A) is nonpositive.

1666


This section describes the ScaLAPACK routines for the QR (RQ) and LQ (QL) factorization ofmatrices. Routines for the RZ factorization as well as for generalized QR and RQ factorizationsare also included. For the mathematical definition of the factorizations, see the respectiveLAPACK sections or refer to [SLUG].

Table 6-3 lists ScaLAPACK routines that perform orthogonal factorization of matrices.

Table 6-3 Computational Routines for Orthogonal Factorizations

Apply matrix QGeneratematrix Q

Factorize withpivoting

Factorize withoutpivoting

Matrix type,factorization

p?ormqrp?unmqrp?orgqrp?ungqr

p?geqpfp?geqrfgeneral matrices, QRfactorization

p?ormrqp?unmrqp?orgrqp?ungrqp?gerqfgeneral matrices, RQfactorization

p?ormlqp?unmlqp?orglqp?unglqp?gelqfgeneral matrices, LQfactorization

p?ormqlp?unmqlp?orgqlp?ungqlp?geqlfgeneral matrices, QLfactorization

p?ormrzp?unmrzp?tzrzftrapezoidal matrices, RZfactorization

p?ggqrfpair of matrices,generalized QRfactorization

p?ggrqfpair of matrices,generalized RQfactorization

p?geqrfComputes the QR factorization of a general m-by-nmatrix.

Syntax

call psgeqrf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgeqrf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgeqrf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgeqrf( m, n, a, ia, ja, desca, tau, work, lwork, info)

1667


Description

The routine forms the QR factorization of a general m-by-n distributed matrix sub(A)=A(ia:ia+m-1,ja:ja+n-1) as

A=Q*R

Input Parameters


submatrix sub(A); (m ≥ 0).

m


submatrix sub(A); (n ≥ 0).

n

(local)aREAL for psgeqrfDOUBLE PRECISION for pdgeqrfCOMPLEX for pcgeqrfDOUBLE COMPLEX for pzgeqrf.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)).Contains the local pieces of the distributed matrix sub(A)to be factored.

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A(ia:ia+m-1,ja:ja+n-1), respectively.

ia, ja

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A

desca

(local).workREAL for psgeqrfDOUBLE PRECISION for pdgeqrf.COMPLEX for pcgeqrf.DOUBLE COMPLEX for pzgeqrfWorkspace array of dimension lwork.

(local or global) INTEGER, dimension of work, must be at

least lwork ≥ nb_a * (mp0+nq0+nb_a), where

lwork

iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),

1668


iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),and numroc, indxg2p are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

The elements on and above the diagonal of sub(A) containthe min(m,n)-by-n upper trapezoidal matrix R (R is upper

triangular if m ≥ n); the elements below the diagonal, with

a

the array tau, represent the orthogonal/unitary matrix Q asa product of elementary reflectors (see Application Notesbelow).

(local)tauREAL for psgeqrfDOUBLE PRECISION for pdgeqrfCOMPLEX for pcgeqrfDOUBLE COMPLEX for pzgeqrf.Array, DIMENSION LOCc(ja+min(m,n)-1).Contains the scalar factor tau of elementary reflectors. tauis tied to the distributed matrix A.


work(1)

(global) INTEGER.info= 0, the execution is successful.< 0, if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

1669

Application Notes

The matrix Q is represented as a product of elementary reflectors

Q = H(ja) H(ja+1)... H(ja+k-1),

where k = min(m,n).


H(j) = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 andv(i) = 1; v(i+1:m) is stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau(ja+i-1).

p?geqpfComputes the QR factorization of a general m-by-nmatrix with pivoting.

Syntax

call psgeqpf( m, n, a, ia, ja, desca, ipiv, tau, work, lwork, info)

call pdgeqpf( m, n, a, ia, ja, desca, ipiv, tau, work, lwork, info)

call pcgeqpf( m, n, a, ia, ja, desca, ipiv, tau, work, lwork, info)

call pzgeqpf( m, n, a, ia, ja, desca, ipiv, tau, work, lwork, info)

Description

The routine forms the QR factorization with column pivoting of a general m-by-n distributedmatrix sub(A)= A(ia:ia+m-1,ja:ja+n-1) as

sub(A)*P=Q*R

Input Parameters

(global) INTEGER. The number of rows in the submatrix

sub(A) (m ≥ 0).

m

(global) INTEGER. The number of columns in the submatrix

sub(A) (n ≥ 0).

n

(local)aREAL for psgeqpf

1670


DOUBLE PRECISION for pdgeqpfCOMPLEX for pcgeqpfDOUBLE COMPLEX for pzgeqpf.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)).Contains the local pieces of the distributed matrix sub(A)to be factored.


ia, ja


desca

(local).workREAL for psgeqpfDOUBLE PRECISION for pdgeqpf.COMPLEX for pcgeqpf.DOUBLE COMPLEX for pzgeqpfWorkspace array of dimension lwork.

(local or global) INTEGER, dimension of work, must be atleast

lwork

For real flavors:

lwork ≥ max(3,mp0+nq0) + LOCc (ja+n-1) + nq0.For complex flavors:

lwork ≥ max(3,mp0+nq0) .Hereiroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),LOCc (ja+n-1) = numroc(ja+n-1, nb_a,MYCOL,csrc_a, NPCOL), and numroc, indxg2p areScaLAPACK tool functions; MYROW, MYCOL, NPROW and NPCOLcan be determined by calling the subroutineblacs_gridinfo.

1671


If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

The elements on and above the diagonal of sub(A)containthe min(m, n)-by-n upper trapezoidal matrix R (R is upper

triangular if m ≥ n); the elements below the diagonal, with

a

the array tau, represent the orthogonal/unitary matrix Q asa product of elementary reflectors (see Application Notesbelow)

(local) INTEGER. Array, DIMENSION LOCc(ja+n-1).ipivipiv(i) = k, the local i-th column of sub(A)*P was theglobal k-th column of sub(A). ipiv is tied to the distributedmatrix A.

(local)tauREAL for psgeqpfDOUBLE PRECISION for pdgeqpfCOMPLEX for pcgeqpfDOUBLE COMPLEX for pzgeqpf.Array, DIMENSION LOCc(ja+min(m, n)-1)).Contains the scalar factor tau of elementary reflectors. tauis tied to the distributed matrix A.


work(1)

(global) INTEGER.info= 0, the execution is successful.< 0, if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

Application Notes


1672

Q = H(1) H(2)... H(n),


H = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 andv(i) = 1; v(i+1:m) is stored on exit in A(ia+i-1:ia+m-1,ja+i-1).

The matrix P is represented in jpvt as follows: if jpvt(j)= i then the j-th column of P is thei-th canonical unit vector.

p?orgqrGenerates the orthogonal matrix Q of the QRfactorization formed by p?geqrf.

Syntax

call psorgqr(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorgqr(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n real distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the first n columnsof a product of k elementary reflectors of order m

Q= H(1) H(2)...H(k)

as returned by p?geqrf.

Input Parameters


sub(Q) (m ≥ 0).

m


sub(Q) (m ≥ n ≥ 0).

n

(global) INTEGER. The number of elementary reflectors

whose product defines the matrix Q (n ≥ k ≥ 0).

k

(local)aREAL for psorgqr

1673


DOUBLE PRECISION for pdorgqrPointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)). The j-th column must containthe vector which defines the elementary reflector H(j),

ja≤j≤ja +k-1, as returned by p?geqrf in the k columnsof its distributed matrix argument A(ia:*, ja:ja+k-1).


ia, ja


desca

(local)tauREAL for psorgqrDOUBLE PRECISION for pdorgqrArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqrf. tau is tied to the distributedmatrix A.

(local)workREAL for psorgqrDOUBLE PRECISION for pdorgqrWorkspace array of dimension of lwork.

(local or global) INTEGER, dimension of work.lwork

Must be at least lwork ≥ nb_a*(nqa0 + mpa0 + nb_a),whereiroffa = mod(ia-1, mb_a), icoffa = mod(ja-1,nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL);indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.

1674


Output Parameters

Contains the local pieces of the m-by-n distributed matrixQ.

a


work(1)

(global) INTEGER.info= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

p?ungqrGenerates the complex unitary matrix Q of the QRfactorization formed by p?geqrf.

Syntax

call pcungqr( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzungqr( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n complex distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the first n columnsof a product of k elementary reflectors of order m

Q = H(1) H(2)... H(k)


1675

Input Parameters


sub(Q); (m≥0).

m


sub(Q) (m≥n≥0).

n


whose product defines the matrix Q (n≥k≥0).

k

(local)aCOMPLEX for pcungqrDOUBLE COMPLEX for pzungqrPointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)).The j-th column must contain the

vector which defines the elementary reflector H(j), ja≤ j≤ja +k-1, as returned by p?geqrf in the k columns of itsdistributed matrix argument A(ia:*, ja:ja+k-1).

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A, respectively.

ia, ja


desca

(local)tauCOMPLEX for pcungqrDOUBLE COMPLEX for pzungqrArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqrf. tau is tied to the distributedmatrix A.

(local)workCOMPLEX for pcungqrDOUBLE COMPLEX for pzungqrWorkspace array of dimension of lwork.


least lwork ≥ nb_a*(nqa0 + mpa0 + nb_a), where

lwork

iroffa = mod(ia-1, mb_a),

1676


icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


work(1)


p?ormqrMultiplies a general matrix by the orthogonal matrixQ of the QR factorization formed by p?geqrf.

Syntax

call psormqr( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormqr( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

1677


Description

The routine overwrites the general real m-by-n distributed matrix sub(C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'

sub(C)*QQ*sub(C)trans = 'N':

sub(C)*QTQT*sub(C)trans = 'T':

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors

Q = H(1) H(2)... H(k)

as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

(global) CHARACTERside='L': Q or QT is applied from the left.='R': Q or QT is applied from the right.

(global) CHARACTERtrans='N', no transpose, Q is applied.='T', transpose, QT is applied.


matrix sub(C) (m≥0).

m


matrix sub(C) (n≥0).

n

(global) INTEGER. The number of elementary reflectorswhose product defines the matrix Q. Constraints:

k

If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

(local)aREAL for psormqrDOUBLE PRECISION for pdormqr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+k-1)). The j-th column must containthe vector which defines the elementary reflector H(j),

ja≤j≤ja+k-1, as returned by p?geqrf in the k columns of

1678


its distributed matrix argument A(ia:*, ja:ja+k-1).A(ia:*, ja:ja+k-1) is modified by the routine butrestored on exit.

If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)

If side ='R', lld_a ≥ max(1, LOCr(ia+n-1)


ia, ja


desca

(local)tauREAL for psormqrDOUBLE PRECISION for pdormqrArray, DIMENSION LOCc(ja+k-1).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqrf. tau is tied to the distributedmatrix A.

(local)cREAL for psormqrDOUBLE PRECISION for pdormqrPointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.

(global) INTEGER. The row and column indices in the globalarray c indicating the first row and the first column of thesubmatrix C, respectively.

ic, jc

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix C.

descc

(local)workREAL for psormqrDOUBLE PRECISION for pdormqr.Workspace array of dimension of lwork.

(local or global) INTEGER, dimension of work, must be atleast:

lwork

if side = 'L',

1679


lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a)+ nb_a*nb_aelse if side = 'R',

lwork ≥ max((nb_a*(nb_a-1))/2,(nqc0+max(npa0+numroc(numroc(n+icoffc, nb_a, 0,0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a) +nb_a*nb_aend ifwherelcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),npa0= numroc(n+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0= numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0= numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or QT*sub(C), orsub(C)*QT, or sub(C)*Q.

c


work(1)

1680



p?unmqrMultiplies a complex matrix by the unitary matrixQ of the QR factorization formed by p?geqrf.

Syntax

call pcunmqr( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info )

call pzunmqr( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info )

Description

The routine overwrites the general complex m-by-n distributed matrix sub (C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'


sub(C)*QHQH*sub(C)trans = 'T':

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors

Q = H(1) H(2)... H(k) as returned by p?geqrf. Q is of order m if side = 'L' and of ordern if side ='R'.

Input Parameters

(global) CHARACTERside='L': Q or QH is applied from the left.='R': Q or QH is applied from the right.

(global) CHARACTERtrans='N', no transpose, Q is applied.

1681


='C', conjugate transpose, QH is applied.



m



n


k



(local)aCOMPLEX for pcunmqrDOUBLE COMPLEX for pzunmqr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+k-1)). The j-th column must containthe vector which defines the elementary reflector H(j),

ja≤j≤ja +k-1, as returned by p?geqrf in the k columnsof its distributed matrix argument A(ia:*, ja:ja+k-1).A(ia:*, ja:ja+k-1) is modified by the routine butrestored on exit.

If side= 'L', lld_a≥ max(1, LOCr(ia+m-1)

If side = 'R', lld_a≥ max(1, LOCr(ia+n-1)


ia, ja


desca

(local)tauCOMPLEX for pcunmqrDOUBLE COMPLEX for pzunmqrArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqrf. tau is tied to the distributedmatrix A.

(local)c

1682


COMPLEX for pcunmqrDOUBLE COMPLEX for pzunmqr.Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workCOMPLEX for pcunmqrDOUBLE COMPLEX for pzunmqr.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +mpc0)*nb_a) + nb_a*nb_aelse if side = 'R',

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0+ numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL),nb_a, 0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_aend ifwherelcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc(n+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),

1683


mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or QH*sub(C), orsub(C)*QH, or sub(C)*Q .

c


work(1)


p?gelqfComputes the LQ factorization of a generalrectangular matrix.

Syntax

call psgelqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgelqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgelqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgelqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

1684


Description

The routine computes the LQ factorization of a real/complex distributed m-by-n matrix sub(A)=A(ia:ia+m-1, ia:ia+n-1) = L*Q.

Input Parameters


sub(Q) (m ≥ 0).

m


sub(Q) (n ≥ 0).

n


whose product defines the matrix Q (n ≥ k ≥ 0).

k

(local)aREAL for psgelqfDOUBLE PRECISION for pdgelqfCOMPLEX for pcgelqfDOUBLE COMPLEX for pzgelqfPointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)).Contains the local pieces of the distributed matrix sub(A)to be factored.

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A((ia:ia+m-1, ia:ia+n-1), respectively.

ia, ja


desca

(local)workREAL for psgelqfDOUBLE PRECISION for pdgelqfCOMPLEX for pcgelqfDOUBLE COMPLEX for pzgelqfWorkspace array of dimension of lwork.


least lwork ≥ mb_a*(mp0 + nq0 + mb_a), where

lwork

iroff = mod(ia-1, mb_a),

1685


icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

The elements on and below the diagonal of sub(A) containthe m by min(m,n) lower trapezoidal matrix L (L is lower

trapezoidal if m ≤ n); the elements above the diagonal, with

a

the array tau, represent the orthogonal/unitary matrix Q asa product of elementary reflectors (see Application Notesbelow)

(local)tauREAL for psgelqfDOUBLE PRECISION for pdgelqfCOMPLEX for pcgelqfDOUBLE COMPLEX for pzgelqfArray, DIMENSION LOCr(ia+min(m, n)-1)).Contains the scalar factors of elementary reflectors. tau istied to the distributed matrix A.


work(1)


1686


Application Notes


Q = H(ia+k-1) H(ia+k-2)... H(ia),

where k = min(m,n)


H(i) = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 andv(i) = 1; v(i+1:n) is stored on exit in A(ia+i-1:ia+i-1,ja+n-1), and tau in tau(ia+i-1).

p?orglqGenerates the real orthogonal matrix Q of the LQfactorization formed by p?gelqf.

Syntax

call psorglq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorglq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of kelementary reflectors of order n

Q = H(k)... H(2) H(1)

as returned by p?gelqf.

Input Parameters


sub(Q); (m≥0).

m


sub(Q) (n≥m≥0).

n

1687



whose product defines the matrix Q (m≥k≥0).

k

(local)aREAL for psorglqDOUBLE PRECISION for pdorglqPointer into the local memory to an array of local dimension(lld_a, LOCc (ja+n-1)). On entry, the i-th row mustcontain the vector which defines the elementary reflector

H(i), ia≤i≤ia+k-1, as returned by p?gelqf in the k rowsof its distributed matrix argument A(ia:ia+k -1, ja:*).

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A((ia:ia+m-1, ja:ja+n-1), respectively.

ia, ja


desca

(local)workREAL for psorglqDOUBLE PRECISION for pdorglqWorkspace array of dimension of lwork.


least lwork ≥ mb_a*(mpa0 + nqa0 + mb_a), where

lwork

iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

1688


Output Parameters

Contains the local pieces of the m-by-n distributed matrix Qto be factored.

a

(local)tauREAL for psorglqDOUBLE PRECISION for pdorglqArray, DIMENSION LOCr(ia+k-1).Contains the scalar factors tau of elementary reflectors H(i).tau is tied to the distributed matrix A.


work(1)


p?unglqGenerates the unitary matrix Q of the LQfactorization formed by p?gelqf.

Syntax

call pcunglq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzunglq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n complex distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of aproduct of k elementary reflectors of order n

Q = H(k)... H(2)' H(1)'


1689


Input Parameters


sub(Q) (m≥0).

m


sub(Q) (n≥m≥0).

n



k

(local)aCOMPLEX for pcunglqDOUBLE COMPLEX for pzunglqPointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)). On entry, the i-th row mustcontain the vector which defines the elementary reflector

H(i), ia≤i≤ia+k-1, as returned by p?gelqf in the k rowsof its distributed matrix argument A(ia:ia+k-1, ja:*).


ia, ja


desca

(local)tauCOMPLEX for pcunglqDOUBLE COMPLEX for pzunglqArray, DIMENSION LOCr(ia+k-1)).Contains the scalar factors tau of elementary reflectors H(i).tau is tied to the distributed matrix A.

(local)workCOMPLEX for pcunglqDOUBLE COMPLEX for pzunglqWorkspace array of dimension of lwork.


least lwork ≥ mb_a*(mpa0 + nqa0 + mb_a), where

lwork

iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),

1690


iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


work(1)


p?ormlqMultiplies a general matrix by the orthogonal matrixQ of the LQ factorization formed by p?gelqf.

Syntax

call psormlq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, work,lwork, info)

call pdormlq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, work,lwork, info)

1691


Description


side ='R'side ='L'




Q = H(k)...H(2) H(1)

as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n


k



(local)aREAL for psormlqDOUBLE PRECISION for pdormlq.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)), if side = 'L' and (lld_a,LOCc(ja+n-1)), if side = 'R'.The i-th row must contain

1692


the vector which defines the elementary reflector H(i),

ia≤i≤ia+k-1, as returned by p?gelqf in the k rows of itsdistributed matrix argument A(ia:ia+k-1, ja:*).A(ia:ia+k-1, ja:*) is modified by the routine butrestored on exit.


ia, ja


desca

(local)tauREAL for psormlqDOUBLE PRECISION for pdormlqArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gelqf. tau is tied to the distributedmatrix A.

(local)cREAL for psormlqDOUBLE PRECISION for pdormlqPointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workREAL for psormlqDOUBLE PRECISION for pdormlq.Workspace array of dimension of lwork.

(local or global) INTEGER, dimension of the array work;must be at least:

lwork

If side = 'L',

1693


lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0+max mqa0)+numroc(numroc(m + iroffc, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nqc0))* mb_a) + mb_a*mb_aelse if side = 'R',

lwork ≥ max((mb_a* (mb_a-1))/2, (mpc0+nqc0)*mb_a+ mb_a*mb_aend ifwherelcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or Q' *sub (C), orsub(C)*Q', or sub(C)*Q

c


work(1)

(global) INTEGER.info

1694


= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

p?unmlqMultiplies a general matrix by the unitary matrixQ of the LQ factorization formed by p?gelqf.

Syntax

call pcunmlq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info )

call pzunmlq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info )

Description

The routine overwrites the general complex m-by-n distributed matrix sub (C) = C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'


sub(C)*QHQH*sub(C)trans = 'T':


Q = H(k)' ... H(2)' H(1)'

as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters


(global) CHARACTERtrans='N', no transpose, Q is applied.='C', conjugate transpose, QH is applied.

1695



m



n


k



(local)aCOMPLEX for pcunmlqDOUBLE COMPLEX for pzunmlq.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)), if side = 'L', and (lld_a,

LOCc(ja+n-1)), if side = 'R', where lld_a ≥ max(1,LOCr (ia+k-1)). The i-th column must contain the vector

which defines the elementary reflector H(i), ia≤i≤ia+k-1,as returned by p?gelqf in the k rows of its distributedmatrix argument A(ia:ia+k-1, ja:*). A(ia:ia+k-1,ja:*) is modified by the routine but restored on exit.


ia, ja


desca

(local)tauCOMPLEX for pcunmlqDOUBLE COMPLEX for pzunmlqArray, DIMENSION LOCc(ia+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gelqf. tau is tied to the distributedmatrix A.

(local)cCOMPLEX for pcunmlqDOUBLE COMPLEX for pzunmlq.

1696


Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workCOMPLEX for pcunmlqDOUBLE COMPLEX for pzunmlq.Workspace array of dimension of lwork.

(local or global) INTEGER, dimension of the array work;must be at least:

lwork

If side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 + max mqa0)+numroc(numroc(m + iroffc, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_aelse if side = 'R',

lwork ≥ max((mb_a* (mb_a-1))/2, (mpc0 +nqc0)*mb_a + mb_a*mb_aend ifwherelcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(m + icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),

1697


nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or Q'*sub (C), orsub(C)*Q', or sub(C)*Q

c


work(1)


p?geqlfComputes the QL factorization of a general matrix.

Syntax

call psgeqlf(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgeqlf(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgeqlf(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgeqlf(m, n, a, ia, ja, desca, tau, work, lwork, info)

1698


Description

The routine forms the QL factorization of a real/complex distributed m-by-n matrix sub(A) =A(ia:ia+m-1, ja:ja+n-1) = Q*L.

Input Parameters


sub(Q); (m ≥ 0).

m


sub(Q) (n ≥ 0).

n

(local)aREAL for psgeqlfDOUBLE PRECISION for pdgeqlfCOMPLEX for pcgeqlfDOUBLE COMPLEX for pzgeqlfPointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)). Contains the local pieces of thedistributed matrix sub(A) to be factored.

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A((ia:ia+m-1, ia:ia+n-1), respectively.

ia, ja


desca

(local)workREAL for psgeqlfDOUBLE PRECISION for pdgeqlfCOMPLEX for pcgeqlfDOUBLE COMPLEX for pzgeqlfWorkspace array of dimension of lwork.


least lwork ≥ nb_a*(mp0 + nq0 + nb_a), where

lwork

iroff = mod(ia-1, mb_a),icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),

1699


nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)numroc and indxg2p are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, if m≥n, the lower triangle of the distributedsubmatrix A(ia+m-n:ia+m-1, ja:ja+n-1) contains the

n-by-n lower triangular matrix L; if m≤n, the elements on

a

and below the (n-m)-th superdiagonal contain the m-by-nlower trapezoidal matrix L; the remaining elements, withthe array tau, represent the orthogonal/unitary matrix Q asa product of elementary reflectors (see Application Notesbelow)

(local)tauREAL for psgeqlfDOUBLE PRECISION for pdgeqlfCOMPLEX for pcgeqlfDOUBLE COMPLEX for pzgeqlfArray, DIMENSION LOCc(ja+n-1)).Contains the scalar factors of elementary reflectors. tau istied to the distributed matrix A.


work(1)


1700


Application Notes


Q = H(ja+k-1)... H(ja+1) H(ja),

where k = min(m,n)


H(i) = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1:m) = 0and v(m-k+i) = 1; v(m-k+i-1) is stored on exit in A(ia+ia+m-k+i-2, ja+n-k+i-1), andtau in tau (ja+n-k+i-1).

p?orgqlGenerates the orthogonal matrix Q of the QLfactorization formed by p?geqlf.

Syntax

call psorgql( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorgql( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n real distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of aproduct of k elementary reflectors of order n

Q = H(k)... H(2) H(1)

as returned by p?geqlf.

Input Parameters


sub(Q); (m≥0).

m


sub(Q)(m≥n≥0).

n

1701




k

(local)aREAL for psorgqlDOUBLE PRECISION for pdorgqlPointer into the local memory to an array of local dimension(lld_a, LOCc (ja+n-1)). On entry, the j-th column mustcontain the vector which defines the elementary reflector

H(j),ja+n-k≤j≤ja+n-1, as returned by p?geqlf in the k

columns of its distributed matrix argumentA(ia:*ja+n-k:ja+n-1).

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix A((ia:ia+m-1, ja:ja+n-1), respectively.

ia, ja


desca

(local)tauREAL for psorgqlDOUBLE PRECISION for pdorgqlArray, DIMENSION LOCc(ja+n-1)).Contains the scalar factors tau(j) of elementary reflectorsH(j). tau is tied to the distributed matrix A.

(local)workREAL for psorgqlDOUBLE PRECISION for pdorgqlWorkspace array of dimension of lwork.


least lwork ≥ nb_a* (nqa0 + mpa0 + nb_a), where

lwork

iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL)

1702


indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


work(1)


p?ungqlGenerates the unitary matrix Q of the QLfactorization formed by p?geqlf.

Syntax

call pcungql( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzungql( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n complex distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal rows, which is defined as the first n columns ofa product of k elementary reflectors of order m

Q = H(k)... H(2)' H(1)'

1703


as returned by p?geqlf.

Input Parameters


sub(Q) (m≥0).

m


sub(Q) (m≥n≥0).

n



k

(local)aCOMPLEX for pcungqlDOUBLE COMPLEX for pzungql Pointer into the local memoryto an array of local dimension (lld_a, LOCc(ja+n-1)).On entry, the j-th column must contain the vector which

defines the elementary reflector H(j), ja+n-k≤ j≤ ja+n-1,as returned by p?geqlf in the k columns of its distributedmatrix argument A(ia:*, ja+n-k: ja+n-1).


ia, ja


desca

(local)tauCOMPLEX for pcungqlDOUBLE COMPLEX for pzungqlArray, DIMENSION LOCr(ia+n-1)).Contains the scalar factors tau (j) of elementary reflectorsH(j). tau is tied to the distributed matrix A.

(local)workCOMPLEX for pcungqlDOUBLE COMPLEX for pzungqlWorkspace array of dimension of lwork.


least lwork ≥ nb_a*(nqa0 + mpa0 + nb_a), where

lwork

1704


iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


work(1)


1705


p?ormqlMultiplies a general matrix by the orthogonal matrixQ of the QL factorization formed by p?geqlf.

Syntax

call psormql( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormql( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine overwrites the general real m-by-n distributed matrix sub(C) = C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'




Q = H(k)' ... H(2)' H(1)'

as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n

1706



k



(local)aREAL for psormqlDOUBLE PRECISION for pdormql.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+k-1)). The j-th column must containthe vector which defines the elementary reflector H(j),

ja≤j≤ja+k-1, as returned by p?gelqf in the k columns ofits distributed matrix argument A(ia:*,ja:ja+k-1).A(ia:*, ja:ja+k-1) is modified by theroutine but restored on exit.

If side = 'L',lld_a ≥ max(1, LOCr(ia+m-1)),

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).


ia, ja


desca

(local)tauREAL for psormqlDOUBLE PRECISION for pdormql.Array, DIMENSION LOCc(ja+n-1) ).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqlf. tau is tied to the distributedmatrix A.

(local)cREAL for psormqlDOUBLE PRECISION for pdormql.Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.

1707



ic, jc


descc

(local)workREAL for psormqlDOUBLE PRECISION for pdormql.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a+ nb_a*nb_aelse if side ='R',

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+max npa0)+numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL),nb_a, 0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_aend ifwherelcmp = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),npa0= numroc(n + iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.

1708



Output Parameters

Overwritten by the product Q* sub(C), or Q'*sub (C), orsub(C)* Q', or sub(C)* Q

c


work(1)


p?unmqlMultiplies a general matrix by the unitary matrixQ of the QL factorization formed by p?geqlf.

Syntax

call pcunmql( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunmql( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine overwrites the general complex m-by-n distributed matrix sub(C) = C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'


sub(C)*QHQH*sub(C)trans = 'C':

1709



Q = H(k)' ... H(2)' H(1)'

as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n


k



(local)aCOMPLEX for pcunmqlDOUBLE COMPLEX for pzunmql.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+k-1)). The j-th column must containthe vector which defines the elementary reflector H(j),

ja≤j≤ja+k-1, as returned by p?geqlf in the k columns ofits distributed matrix argument A(ia:*,ja:ja+k-1).A(ia:*, ja:ja+k-1) is modified by theroutine but restored on exit.

If side = 'L',lld_a ≥ max(1, LOCr(ia+m-1)),

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

1710



ia, ja


desca

(local)tauCOMPLEX for pcunmqlDOUBLE COMPLEX for pzunmqlArray, DIMENSION LOCc(ia+n-1)).Contains the scalar factor tau (j) of elementary reflectorsH(j) as returned by p?geqlf. tau is tied to the distributedmatrix A.

(local)cCOMPLEX for pcunmqlDOUBLE COMPLEX for pzunmql.Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workCOMPLEX for pcunmqlDOUBLE COMPLEX for pzunmql.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((nb_a* (nb_a-1))/2, (nqc0+mpc0)*nb_a+ nb_a*nb_aelse if side ='R',

1711


lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+max npa0)+numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL),nb_a, 0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_aend ifwherelcmp = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc (n + iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q* sub(C), or Q' sub (C), orsub(C)* Q', or sub(C)* Q

c


work(1)

(global) INTEGER.info= 0: the execution is successful.

1712


< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

p?gerqfComputes the RQ factorization of a generalrectangular matrix.

Syntax

call psgerqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgerqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgerqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgerqf( m, n, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine forms the QR factorization of a general m-by-n distributed matrix sub(A)=A(ia:ia+m-1,ja:ja+n-1) as

A= R*Q

Input Parameters


submatrix sub(A); (m≥0).

m


submatrix sub(A); (n≥0).

n

(local)aREAL for psgeqrfDOUBLE PRECISION for pdgeqrfCOMPLEX for pcgeqrfDOUBLE COMPLEX for pzgeqrf.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)).

1713

Contains the local pieces of the distributed matrix sub(A)to be factored.


ia, ja

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A

desca

(local).workREAL for psgeqrfDOUBLE PRECISION for pdgeqrf.COMPLEX for pcgeqrf.DOUBLE COMPLEX for pzgeqrfWorkspace array of dimension lwork.


least lwork ≥ mb_a*(mp0+nq0+mb_a), where

lwork

iroff = mod(ia-1, mb_a),icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)and numroc, indxg2p are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, if m≤n, the upper triangle of A(ia:ia+m-1,ja:ja+n-1) contains the m-by-m upper triangular matrix R;

if m≥n, the elements on and above the (m - n)-th subdiagonal

a

contain the m-by-n upper trapezoidal matrix R; the remaining

1714


elements, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors (see Application Notes below)

(local)tauREAL for psgeqrfDOUBLE PRECISION for pdgeqrfCOMPLEX for pcgeqrfDOUBLE COMPLEX for pzgeqrf.Array, DIMENSION LOCr(ia+m-1).Contains the scalar factor tau of elementary reflectors. tauis tied to the distributed matrix A.


work(1)

(global) INTEGER.info= 0, the execution is successful.< 0, if the i-th argument is an array and the j-entry hadan illegal value, then info = -(i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

Application Notes


Q = H(ia) H(ia+1)... H(ia+k-1),

where k = min(m,n).


H(i) = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0and v(n-k+i) = 1; v(1:n-k+i-1)/conjg (v(1:n-k+i-1)) is stored on exit inA(ia+m-k+i-1,ja:ja+n-k+i-2), and tau in tau(ia+m-k+i-1).

1715

p?orgrqGenerates the orthogonal matrix Q of the RQfactorization formed by p?gerqf.

Syntax

call psorgrq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorgrq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the whole or part of m-by-n real distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the last m rows ofa product of k elementary reflectors of order m

Q= H(1) H(2)...H(k)

as returned by p?gerqf.

Input Parameters


sub(Q); (m≥0).

m


sub(Q) (n≥m≥0).

n



k

(local)aREAL for psorgrqDOUBLE PRECISION for pdorgrqPointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)). The i-th column must containthe vector which defines the elementary reflector H(i),

ja≤j≤ja+k-1, as returned by p?geqrf in the k columns ofits distributed matrix argument A(ia:*, ja:ja+k-1).


ia, ja

1716



desca

(local)tauREAL for psorgrqDOUBLE PRECISION for pdorgrqArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gerqf. tau is tied to the distributedmatrix A.

(local)workREAL for psorgrqDOUBLE PRECISION for pdorgrqWorkspace array of dimension of lwork.


least lwork≥mb_a*(mpa0 + nqa0 + mb_a), where

lwork

iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


work(1)

1717



p?ungrqGenerates the unitary matrix Q of the RQfactorization formed by p?gerqf.

Syntax

call pcungrq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzungrq( m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine generates the m-by-n complex distributed matrix Q denotingA(ia:ia+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the last m rows of aproduct of k elementary reflectors of order n

Q = H(1)' H(2)'... H(k)'


Input Parameters


sub(Q); (m≥0).

m


sub(Q) (n≥m≥0).

n



k

(local)aCOMPLEX for pcungrqDOUBLE COMPLEX for pzungrqc

1718


Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). The i-th row must contain thevector which defines the elementary reflector H(i),

ia+m-k≤i≤ia+m-1, as returned by p?gerqf in the k rowsof its distributed matrix argument A(ia+m-k:ia+m-1,ja:*).


ia, ja


desca

(local)tauCOMPLEX for pcungrqDOUBLE COMPLEX for pzungrqArray, DIMENSION LOCr(ia+m-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gerqf. tau is tied to the distributedmatrix A.

(local)workCOMPLEX for pcungrqDOUBLE COMPLEX for pzungrqWorkspace array of dimension of lwork.


least lwork ≥ mb_a*(mpa0 +nqa0+mb_a), where

lwork

iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.

1719



Output Parameters


a


work(1)


p?ormrqMultiplies a general matrix by the orthogonal matrixQ of the RQ factorization formed by p?gerqf.

Syntax

call psormrq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormrq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description


side ='R'side ='L'



1720



Q = H(1) H(2)... H(k)

as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n


k



(local)aREAL for psormqrDOUBLE PRECISION for pdormqr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side = 'L', and (lld_a,LOCc(ja+n-1)) if side = 'R'.The i-th row must contain the vector which defines the

elementary reflector H(i), ia≤i≤ia+k-1, as returned byp?gerqf in the k rows of its distributed matrix argumentA(ia:ia+k-1, ja:*).A(ia:ia+k-1, ja:*) is modifiedby the routine but restored on exit.


ia, ja

1721



desca

(local)tauREAL for psormqrDOUBLE PRECISION for pdormqrArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gerqf. tau is tied to the distributedmatrix A.

(local)cREAL for psormrqDOUBLE PRECISION for pdormrqPointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workREAL for psormrqDOUBLE PRECISION for pdormrq.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0+ numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_aelse if side ='R',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +nqc0)*mb_a) + mb_a*mb_aend ifwhere

1722


lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q* sub(C), or Q'*sub (C), orsub(C)* Q', or sub(C)* Q

c


work(1)


1723


p?unmrqMultiplies a general matrix by the unitary matrixQ of the RQ factorization formed by p?gerqf.

Syntax

call pcunmrq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunmrq( side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine overwrites the general complex m-by-n distributed matrix sub (C)=C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'




Q = H(1)' H(2)'... H(k)'

as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n

1724



k



(local)aCOMPLEX for pcunmrqDOUBLE COMPLEX for pzunmrq.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side = 'L', and (lld_a,LOCc(ja+n-1)) if side = 'R'. The i-th row must containthe vector which defines the elementary reflector H(i),

ia≤i≤ia+k-1, as returned by p?gerqf in the k rows of itsdistributed matrix argument A(ia:ia +k-1,ja*).A(ia:ia +k-1, ja*) is modified by the routine butrestored on exit.


ia, ja


desca

(local)tauCOMPLEX for pcunmrqDOUBLE COMPLEX for pzunmrqArray, DIMENSION LOCc(ja+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gerqf. tau is tied to the distributedmatrix A.

(local)cCOMPLEX for pcunmrqDOUBLE COMPLEX for pzunmrq.Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.

1725



ic, jc


descc

(local)workCOMPLEX for pcunmrqDOUBLE COMPLEX for pzunmrq.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +max(mqa0+numroc(numroc(n+iroffc, mb_a, 0, 0,NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a) +mb_a*mb_aelse if side = 'R',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +nqc0)*mb_a) + mb_a*mb_aend ifwherelcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),

1726


ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q* sub(C) or Q'*sub (C), orsub(C)* Q', or sub(C)* Q

c


work(1)


p?tzrzfReduces the upper trapezoidal matrix A to uppertriangular form.

Syntax

call pstzrzf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdtzrzf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pctzrzf( m, n, a, ia, ja, desca, tau, work, lwork, info)

call pztzrzf( m, n, a, ia, ja, desca, tau, work, lwork, info)

1727


Description

This routine reduces the m-by-n (m ≤ n) real/complex upper trapezoidal matrixsub(A)=(ia:ia+m-1,ja:ja+n-1) to upper triangular form by means of orthogonal/unitarytransformations. The upper trapezoidal matrix A is factored as

A = (R 0)*Z,


Input Parameters


sub(A); (m≥0).

m


sub(A) (n≥0).

n

(local)aREAL for pstzrzfDOUBLE PRECISION for pdtzrzf.COMPLEX for pctzrzf.DOUBLE COMPLEX for pztzrzf.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). Contains the local pieces of them-by-n distributed matrix sub (A) to be factored.


ia, ja


desca

(local)workREAL for pstzrzfDOUBLE PRECISION for pdtzrzf.COMPLEX for pctzrzf.DOUBLE COMPLEX for pztzrzf.Workspace array of dimension of lwork.


least lwork ≥ mb_a*(mp0+nq0+mb_a), where

lwork

iroff = mod(ia-1, mb_a),

1728


icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mp0 = numroc (m+iroff, mb_a, MYROW, iarow,NPROW),nq0 = numroc (n+icoff, nb_a, MYCOL, iacol,NPCOL)indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, the leading m-by-m upper triangular part of sub(A)contains the upper triangular matrix R, and elements m+1to n of the first m rows of sub (A), with the array tau,represent the orthogonal/unitary matrix Z as a product ofm elementary reflectors.

a


work(1)

(local)tauREAL for pstzrzfDOUBLE PRECISION for pdtzrzf.COMPLEX for pctzrzf.DOUBLE COMPLEX for pztzrzf.Array, DIMENSION LOCr(ia+m-1)).Contains the scalar factor of elementary reflectors. tau istied to the distributed matrix A.


1729



Application Notes

The factorization is obtained by the Householder's method. The k-th transformation matrix,Z(k), which is or whose conjugate transpose is used to introduce zeros into the (m - k +1)-throw of sub(A), is given in the form

where

T(k) = i - tau*u(k)*u(k)',

tau is a scalar and Z(k) is an (n - m) element vector. tau and Z(k) are chosen to annihilate theelements of the k-th row of sub(A). The scalar tau is returned in the k-th element of tau andthe vector u(k) in the k-th row of sub(A), such that the elements of Z(k) are in a(k, m +1),..., a(k, n). The elements of R are returned in the upper triangular part of sub(A). Z isgiven by

Z = Z(1) * Z(2) *... * Z(m).

1730


p?ormrzMultiplies a general matrix by the orthogonal matrixfrom a reduction to upper triangular form formedby p?tzrzf.

Syntax

call psormrz( side, trans, m, n, k, l, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormrz( side, trans, m, n, k, l, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine overwrites the general real m-by-n distributed matrix sub(C) = C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'




Q = H(1) H(2)... H(k)

as returned by p?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters





m



n

1731



k

If side = 'L', m ≥ k ≥0

If side = 'R', n ≥ k ≥0.

(global)lThe columns of the distributed submatrix sub(A) containingthe meaningful part of the Householder reflectors.

If side = 'L', m ≥ l ≥0

If side = 'R', n ≥ l ≥0.

(local)aREAL for psormrzDOUBLE PRECISION for pdormrz.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side = 'L', and (lld_a,

LOCc(ja+n-1)) if side = 'R', where lld_a ≥max(1,LOCr(ia+k-1).The i-th row must contain the vector which defines the

elementary reflector H(i), ia≤i≤ia+k-1, as returned byp?tzrzf in the k rows of its distributed matrix argumentA(ia:ia+k-1, ja:*).A(ia:ia+k-1, ja:*) is modifiedby the routine but restored on exit.


ia, ja


desca

(local)tauREAL for psormrzDOUBLE PRECISION for pdormrzArray, DIMENSION LOCc(ia+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?tzrzf. tau is tied to the distributedmatrix A.

(local)cREAL for psormrz

1732


DOUBLE PRECISION for pdormrzPointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workREAL for psormrzDOUBLE PRECISION for pdormrz.Workspace array of dimension of lwork.


lwork

If side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0+ numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_aelse if side ='R',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +nqc0)*mb_a) + mb_a*mb_aend ifwherelcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1,nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),

1733


nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or Q'*sub (C), orsub(C)*Q', or sub(C)*Q

c


work(1)


p?unmrzMultiplies a general matrix by the unitarytransformation matrix from a reduction to uppertriangular form determined by p?tzrzf.

Syntax

call pcunmrz( side, trans, m, n, k, l, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunmrz( side, trans, m, n, k, l, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

1734


Description

The routine overwrites the general complex m-by-n distributed matrix sub(C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'




Q = H(1)' H(2)'... H(k)'

as returned by pctzrzf/pztzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters




matrix sub(C), (m≥0).

m


matrix sub(C), (n≥0).

n


k



(global) INTEGER. The columns of the distributed submatrixsub(A) containing the meaningful part of the Householderreflectors.

l

If side = 'L', m≥l≥0

If side = 'R', n≥l≥0.

(local)a

1735


COMPLEX for pcunmrzDOUBLE COMPLEX for pzunmrz.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side = 'L', and (lld_a,

LOCc(ja+n-1)) if side = 'R', where lld_a≥max(1, LOCr(ja+k-1). The i-th row must contain the vector which

defines the elementary reflector H(i), ia≤i≤ia+k-1, asreturned by p?gerqf in the k rows of its distributed matrixargument A(ia:ia +k-1, ja*). A(ia:ia +k-1, ja*)is modified by the routine but restored on exit.


ia, ja


desca

(local)tauCOMPLEX for pcunmrzDOUBLE COMPLEX for pzunmrzArray, DIMENSION LOCc(ia+k-1)).Contains the scalar factor tau (i) of elementary reflectorsH(i) as returned by p?gerqf. tau is tied to the distributedmatrix A.

(local)cCOMPLEX for pcunmrzDOUBLE COMPLEX for pzunmrz.Pointer into the local memory to an array of local dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C)to be factored.


ic, jc


descc

(local)workCOMPLEX for pcunmrz

1736


DOUBLE COMPLEX for pzunmrz.Workspace array of dimension lwork.


lwork

If side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2,(mpc0+max(mqa0+numroc(numroc(n+iroffc, mb_a, 0,0, NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a) +mb_a*mb_aelse if side ='R',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0+nqc0)*mb_a)+ mb_a*mb_aend ifwherelcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

1737


Output Parameters

Overwritten by the product Q* sub(C), or Q'*sub (C), orsub(C)*Q', or sub(C)*Q

c


work(1)

(global) INTEGER.info= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

p?ggqrfComputes the generalized QR factorization.

Syntax

call psggqrf(n, m, p, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pdggqrf(n, m, p, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pcggqrf(n, m, p, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pzggqrf(n, m, p, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

Description

The routine forms the generalized QR factorization of an n-by-m matrix

sub(A) = A(ia:ia+n-1, ja:ja+m-1)

and an n-by-p matrix

sub(B) = B(ib:ib+n-1, jb:jb+p-1):

as

sub(A) = Q*R, sub(B) = Q*T*Z,

1738

where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, andR and T assume one of the forms:

If n ≥ m

or if n <m

where R11 is upper triangular, and

where T12 or T21 is an upper triangular matrix.

In particular, if sub(B) is square and nonsingular, the GQR factorization of sub(A) and sub(B)implicitly gives the QR factorization of inv (sub(B))* sub (A):

inv(sub(B))*sub(A) = ZH*(inv(T)*R

1739

Input Parameters


matrices sub (A) and sub(B) (n≥0).

n


matrix sub(A) (m≥0).

m

INTEGER. The number of columns in the distributed matrix

sub(B) (p≥0).

p

(local)aREAL for psggqrfDOUBLE PRECISION for pdggqrfCOMPLEX for pcggqrfDOUBLE COMPLEX for pzggqrf.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)). Contains the local pieces of then-by-m matrix sub(A) to be factored.


ia, ja


desca

(local)bREAL for psggqrfDOUBLE PRECISION for pdggqrfCOMPLEX for pcggqrfDOUBLE COMPLEX for pzggqrf.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+p-1)). Contains the local pieces of then-by-p matrix sub(B) to be factored.

(global) INTEGER. The row and column indices in the globalarray b indicating the first row and the first column of thesubmatrix B, respectively.

ib, jb


descb

(local)work

1740


REAL for psggqrfDOUBLE PRECISION for pdggqrfCOMPLEX for pcggqrfDOUBLE COMPLEX for pzggqrf.Workspace array of dimension of lwork.

(local or global) INTEGER. Dimension of work, must be atleast

lwork

lwork ≥ max(nb_a*(npa0+mqa0+nb_a),max((nb_a*(nb_a-1))/2,(pqb0+npb0)*nb_a)+nb_a*nb_a,mb_b*(npb0+pqb0+mb_b)),whereiroffa = mod(ia-1, mb_A),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),npa0 = numroc (n+iroffa, mb_a, MYROW, iarow,NPROW),mqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol,NPCOL)iroffb = mod(ib-1, mb_b),icoffb = mod(jb-1, nb_b),ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),npb0 = numroc (n+iroffa, mb_b, MYROW, Ibrow,NPROW),pqb0 = numroc(m+icoffb, nb_b, MYCOL, ibcol,NPCOL)and numroc, indxg2p are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

1741


Output Parameters

On exit, the elements on and above the diagonal of sub (A)contain the min(n, m)-by-m upper trapezoidal matrix R (R is

upper triangular if n ≥ m); the elements below the diagonal,

a

with the array taua, represent the orthogonal/unitary matrixQ as a product of min(n,m) elementary reflectors. (SeeApplication Notes below).

(local)taua, taubREAL for psggqrfDOUBLE PRECISION for pdggqrfCOMPLEX for pcggqrfDOUBLE COMPLEX for pzggqrf.Arrays, DIMENSION LOCc(ja+min(n,m)-1)for taua andLOCr(ib+n-1) for taub.The array taua contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Q.taua is tied to the distributed matrix A. (See ApplicationNotes below).The array taub contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Z.taub is tied to the distributed matrix B. (See ApplicationNotes below).


work(1)


Application Notes


Q = H(ja) H(ja+1)... H(ja+k-1),

where k = min(n,m).

1742



H(i) = i - taua * v * v'

where taua is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 andv(i) = 1; v(i+1:n) is stored on exit in A(ia+i:ia+n-1,ja+i-1), and taua intaua(ja+i-1).To form Q explicitly, use ScaLAPACK subroutine p?orgqr/p?ungqr. To use Qto update another matrix, use ScaLAPACK subroutine p?ormqr/p?unmqr.

The matrix Z is represented as a product of elementary reflectors

Z = H(ib) H(ib+1) . . . H(ib+k-1), where k = min(n,p).


H(i) = i - taub * v * v'

where taub is a real/complex scalar, and v is a real/complex vector with v(p-k+i+1:p) = 0and v(p-k+i) = 1; v(1:p-k+i-1) is stored on exit in B(ib+n-k+i-1,jb:jb+p-k+i-2),and taub in taub(ib+n-k+i-1). To form Z explicitly, use ScaLAPACK subroutinep?orgrq/p?ungrq. To use Z to update another matrix, use ScaLAPACK subroutinep?ormrq/p?unmrq.

p?ggrqfComputes the generalized RQ factorization.

Syntax

call psggrqf(m, p, n, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pdggrqf(m, p, n, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pcggrqf(m, p, n, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

call pzggrqf(m, p, n, a, ia, ja, desca, taua, b, ib, jb, descb, taub, work,lwork, info)

Description

The routine forms the generalized RQ factorization of an m-by-n matrix sub(A)=(ia:ia+m-1,ja:ja+n-1) and a p-by-n matrix sub(B)=(ib:ib+p-1, ja:ja+n-1):

1743


sub(A) = R*Q, sub(B) = Z*T*Q,

where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, andR and T assume one of the forms:

or

where R11 or R21 is upper triangular, and

or

where T11 is upper triangular.

1744


In particular, if sub(B) is square and nonsingular, the GRQ factorization of sub(A) and sub(B)implicitly gives the RQ factorization of sub (A)*inv(sub(B)):

sub(A)*inv(sub(B))= (R*inv(T))*Z'

where inv(sub(B)) denotes the inverse of the matrix sub(B), and Z' denotes the transpose ofmatrix Z.

Input Parameters


matrices sub (A) (m≥0).

m

INTEGER. The number of rows in the distributed matrix

sub(B) (p≥0).

p


matrices sub(A) and sub(B) (n≥0).

n

(local)aREAL for psggrqfDOUBLE PRECISION for pdggrqfCOMPLEX for pcggrqfDOUBLE COMPLEX for pzggrqf.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). Contains the local pieces of them-by-n distributed matrix sub(A) to be factored.


ia, ja


desca

(local)bREAL for psggrqfDOUBLE PRECISION for pdggrqfCOMPLEX for pcggrqfDOUBLE COMPLEX for pzggrqf.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+n-1)).Contains the local pieces of the p-by-n matrix sub(B) to befactored.

1745



ib, jb


descb

(local)workREAL for psggrqfDOUBLE PRECISION for pdggrqfCOMPLEX for pcggrqfDOUBLE COMPLEX for pzggrqf.Workspace array of dimension of lwork.

(local or global) INTEGER.lwork

Dimension of work, must be at least lwork ≥max(mb_a*(mpa0+nqa0+mb_a),max((mb_a*(mb_a-1))/2, (ppb0+nqb0)*mb_a) +mb_a*mb_a, nb_b*(ppb0+nqb0+nb_b)), whereiroffa = mod(ia-1, mb_A),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),mpa0 = numroc (m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol,NPCOL)iroffb = mod(ib-1, mb_b),icoffb = mod(jb-1, nb_b),ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),ppb0 = numroc (p+iroffb, mb_b, MYROW, ibrow,NPROW),nqb0 = numroc (n+icoffb, nb_b, MYCOL, ibcol,NPCOL)and numroc, indxg2p are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.

1746



Output Parameters

On exit, if m≤n, the upper triangle of A(ia:ia+m-1,ja+n-m:ja+n-1) contains the m-by-m upper triangular

matrix R; if m≥n, the elements on and above the (m-n)-th

a

subdiagonal contain the m-by-n upper trapezoidal matrix R;the remaining elements, with the array taua, represent theorthogonal/unitary matrix Q as a product of min(n, m)elementary reflectors. (See Application Notes below).

(local)taua, taubREAL for psggqrfDOUBLE PRECISION for pdggqrfCOMPLEX for pcggqrfDOUBLE COMPLEX for pzggqrf.Arrays, DIMENSION LOCr(ia+m-1)for taua andLOCc(jb+min(p,n)-1) for taub.The array taua contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Q.taua is tied to the distributed matrix A.(See ApplicationNotes below).The array taub contains the scalar factors of the elementaryreflectors which represent the orthogonal/unitary matrix Z.taub is tied to the distributed matrix B. (See ApplicationNotes below).


work(1)


1747


Application Notes


Q = H(ia) H(ia+1)... H(ia+k-1),

where k = min(m,n).


H(i) = i - taua * v * v'

where taua is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0and v(n-k+i) = 1; v(1:n-k+i-1) is stored on exit in A(ia+m-k+i-1, ja:ja+n-k+i-2),and taua in taua(ia+m-k+i-1). To form Q explicitly, use ScaLAPACK subroutinep?orgrq/p?ungrq. To use Q to update another matrix, use ScaLAPACK subroutinep?ormrq/p?unmrq.

The matrix Z is represented as a product of elementary reflectors

Z = H(jb) H(jb+1)... H(jb+k-1), where k = min(p,n).


H(i) = i - taub * v * v'

where taub is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 andv(i)= 1; v(i+1:p) is stored on exit in B(ib+i:ib+p-1,jb+i-1), and taub in taub(jb+i-1).To form Z explicitly, use ScaLAPACK subroutine p?orgqr/p?ungqr. To use Z to update anothermatrix, use ScaLAPACK subroutine p?ormqr/p?unmqr.


To solve a symmetric eigenproblem with ScaLAPACK, you usually need to reduce the matrix toreal tridiagonal form T and then find the eigenvalues and eigenvectors of the tridiagonal matrixT. ScaLAPACK includes routines for reducing the matrix to a tridiagonal form by an orthogonal(or unitary) similarity transformation A = QTQH as well as for solving tridiagonal symmetriceigenvalue problems. These routines are listed in Table 6-4 .

There are different routines for symmetric eigenproblems, depending on whether you needeigenvalues only or eigenvectors as well, and on the algorithm used (either the QTQ algorithm,or bisection followed by inverse iteration).

1748


Table 6-4 Computational Routines for Solving Symmetric Eigenproblems

Symmetrictridiagonalmatrix

Orthogonal/unitarymatrix

Densesymmetric/Hermitianmatrix

Operation

p?sytrd/p?hetrdReduce to tridiagonal form A =QTQH

p?ormtr/p?unmtrMultiply matrix after reduction

steqr2*)Find all eigenvalues andeigenvectors of a tridiagonalmatrix T by a QTQ method

p?stebzFind selected eigenvalues of atridiagonal matrix T via bisection

p?steinFind selected eigenvectors of atridiagonal matrix T by inverseiteration

*) This routine is described as part of auxiliary ScaLAPACK routines.

p?sytrdReduces a symmetric matrix to real symmetrictridiagonal form by an orthogonal similaritytransformation.

Syntax

call pssytrd( uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

call pdsytrd( uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

Description

This routine reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by anorthogonal similarity transformation:

Q'*sub(A)*Q = T,

where sub(A) = A(ia:ia+n-1,ja:ja+n-1).

Input Parameters

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix sub(A) is stored:If uplo = 'U', upper triangular

1749


If uplo = 'L', lower triangular


(n≥0).

n

(local)aREAL for pssytrdDOUBLE PRECISION for pdsytrd.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the symmetric distributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andits strictly upper triangular part is not referenced. SeeApplication Notes below.


ia, ja


desca

(local)workREAL for pssytrdDOUBLE PRECISION for pdsytrd.Workspace array of dimension lwork.


lwork

lwork ≥ max(NB*(np +1), 3*NB),where NB = mb_a = nb_a,np = numroc(n, NB, MYROW, iarow, NPROW),iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.

1750



Output Parameters

On exit, if uplo = 'U', the diagonal and first superdiagonalof sub(A) are overwritten by the corresponding elements ofthe tridiagonal matrix T, and the elements above the first

a

superdiagonal, with the array tau, represent the orthogonalmatrix Q as a product of elementary reflectors; if uplo ='L', the diagonal and first subdiagonal of sub(A) areoverwritten by the corresponding elements of the tridiagonalmatrix T, and the elements below the first subdiagonal, withthe array tau, represent the orthogonal matrix Q as aproduct of elementary reflectors. See Application Notesbelow.

(local)dREAL for pssytrdDOUBLE PRECISION for pdsytrd.Arrays, DIMENSION LOCc(ja+n-1) .The diagonal elementsof the tridiagonal matrix T:d(i)= A(i,i).d is tied to the distributed matrix A.

(local)eREAL for pssytrdDOUBLE PRECISION for pdsytrd.Arrays, DIMENSION LOCc(ja+n-1) if uplo = 'U',LOCc(ja+n-2) otherwise.The off-diagonal elements of the tridiagonal matrix T:e(i)= A(i,i+1) if uplo = 'U',e(i) = A(i+1,i) if uplo = 'L'.e is tied to the distributed matrix A.

(local)tauREAL for pssytrdDOUBLE PRECISION for pdsytrd.

1751


Arrays, DIMENSION LOCc(ja+n-1). This array contains thescalar factors tau of the elementary reflectors. tau is tiedto the distributed matrix A.


work(1)


Application Notes


Q = H(n-1)... H(2) H(1).


H(i) = i - tau * v * v',

where tau is a real scalar, and v is a real vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1)is stored on exit in A(ia:ia+i-2, ja+i), and tau in tau (ja+i-1).


Q = H(1) H(2)... H(n-1).


H(i) = i - tau * v * v',

where tau is a real scalar, and v is a real vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n)is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).

The contents of sub(A) on exit are illustrated by the following examples with n = 5:

If uplo = 'U':

1752


If uplo = 'L':

where d and e denote diagonal and off-diagonal elements of T, and v i denotes an element ofthe vector defining H(i).

p?ormtrMultiplies a general matrix by the orthogonaltransformation matrix from a reduction totridiagonal form determined by p?sytrd.

Syntax

call psormtr( side, uplo, trans, m, n, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormtr( side, uplo, trans, m, n, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

1753


Description

The routine overwrites the general real distributed m-by-n matrix sub(C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'



where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq= n if side = 'R'.

Q is defined as the product of nq elementary reflectors, as returned by p?sytrd.

If uplo = 'U', Q = H(nq-1)... H(2) H(1);

If uplo = 'L', Q = H(1) H(2)... H(nq-1).

Input Parameters



(global) CHARACTER.uplo= 'U': Upper triangle of A(ia:*, ja:*) containselementary reflectors from p?sytrd;= 'L': Lower triangle of A(ia:*,ja:*) contains elementaryreflectors from p?sytrd



m



n

(local)aREAL for psormtrDOUBLE PRECISION for pdormtr.

1754


Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side='L', or (lld_a,LOCc(ja+n-1)) if side = 'R'.Contains the vectors which define the elementary reflectors,as returned by p?sytrd.

If side='L', lld_a ≥ max(1,LOCr(ia+m-1));

If side ='R', lld_a ≥ max(1, LOCr(ia+n-1)).


ia, ja


desca

(local)tauREAL for psormtrDOUBLE PRECISION for pdormtr.Array, DIMENSION of ltau whereif side = 'L' and uplo = 'U', ltau = LOCc(m_a),if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),if side = 'R' and uplo = 'U', ltau = LOCc(n_a),if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by p?sytrd. tau is tied to thedistributed matrix A.

(local) REAL for psormtrcDOUBLE PRECISION for pdormtr.Pointer into the local memory to an array of dimension(lld_a, LOCc (ja+n-1)). Contains the local pieces ofthe distributed matrix sub (C).


ic, jc


descc

(local)workREAL for psormtrDOUBLE PRECISION for pdormtr.

1755


Workspace array of dimension lwork.


lwork

if uplo = 'U',iaa= ia; jaa= ja+1, icc= ic; jcc= jc;else uplo = 'L',iaa= ia+1, jaa= ja;If side = 'L',icc= ic+1; jcc= jc;else icc= ic; jcc= jc+1;end ifend ifIf side = 'L',mi= m-1; ni= n

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +mpc0)*nb_a) + nb_a*nb_aelseIf side = 'R',mi= m; mi = n-1;

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0,NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)+nb_a*nb_aend ifwhere lcmq = lcm/NPCOL with lcm = ilcm(NPROW,NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(icc-1, mb_c),icoffc = mod(jcc-1, nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),

1756


nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo. If lwork = -1,then lwork is global input and a workspace query isassumed; the routine only calculates the minimum andoptimal size for all work arrays. Each of these values isreturned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or Q'*sub(C), orsub(C)*Q', or sub(C)*Q.

c


work(1)


p?hetrdReduces a Hermitian matrix to Hermitiantridiagonal form by a unitary similaritytransformation.

Syntax

call pchetrd( uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

call pzhetrd( uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

Description

This routine reduces a complex Hermitian matrix sub(A) to Hermitian tridiagonal form T by aunitary similarity transformation:

1757


Q'*sub(A)*Q = T

where sub(A) = A(ia:ia+n-1,ja:ja+n-1).

Input Parameters

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of theHermitian matrix sub(A) is stored:If uplo = 'U', upper triangularIf uplo = 'L', lower triangular


(n≥0).

n

(local)aCOMPLEX for pchetrdDOUBLE COMPLEX for pzhetrd.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the Hermitian distributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andits strictly upper triangular part is not referenced.(SeeApplication Notes below).


ia, ja


desca

(local)workCOMPLEX for pchetrdDOUBLE COMPLEX for pzhetrd.Workspace array of dimension lwork.


lwork

lwork≥max(NB*(np +1), 3*NB)

1758


where NB = mb_a = nb_a,np = numroc(n, NB, MYROW, iarow, NPROW),iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit,aIf uplo = 'U', the diagonal and first superdiagonal ofsub(A) are overwritten by the corresponding elements ofthe tridiagonal matrix T, and the elements above the firstsuperdiagonal, with the array tau, represent the unitarymatrix Q as a product of elementary reflectors;if uplo ='L', the diagonal and first subdiagonal of sub(A) areoverwritten by the corresponding elements of the tridiagonalmatrix T, and the elements below the first subdiagonal, withthe array tau, represent the unitary matrix Q as a productof elementary reflectors. (See Application Notes below).

(local)dREAL for pchetrdDOUBLE PRECISION for pzhetrd.Arrays, DIMENSION LOCc(ja+n-1). The diagonal elementsof the tridiagonal matrix T:d(i)= A(i,i).d is tied to the distributed matrix A.

(local)eREAL for pchetrdDOUBLE PRECISION for pzhetrd.Arrays, DIMENSION LOCc(ja+n-1) if uplo = 'U';LOCc(ja+n-2) - otherwise.The off-diagonal elements of the tridiagonal matrix T:

1759


e(i)= A(i,i+1) if uplo = 'U',e(i)= A(i+1,i) if uplo = 'L'.e is tied to the distributed matrix A.

(local) COMPLEX for pchetrdtauDOUBLE COMPLEX for pzhetrd.Arrays, DIMENSION LOCc(ja+n-1). This array contains thescalar factors tau of the elementary reflectors. tau is tiedto the distributed matrix A.


work(1)

(global) INTEGER.info= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

Application Notes


Q = H(n-1)... H(2) H(1).


H(i) = i - tau * v * v',

where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1;v(1:i-1) is stored on exit in A(ia:ia+i-2, ja+i), and tau in tau (ja+i-1).


Q = H(1) H(2)... H(n-1).


H(i) = i - tau * v * v',

where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1;v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).

The contents of sub(A) on exit are illustrated by the following examples with n = 5:

If uplo = 'U':

1760

If uplo = 'L':

where d and e denote diagonal and off-diagonal elements of T, and v i denotes an element ofthe vector defining H(i).

1761


p?unmtrMultiplies a general matrix by the unitarytransformation matrix from a reduction totridiagonal form determined by p?hetrd.

Syntax

call pcunmtr( side, uplo, trans, m, n, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunmtr( side, uplo, trans, m, n, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine overwrites the general complex distributed m-by-n matrix sub(C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'



where Q is a complex unitary distributed matrix of order nq, with nq = m if side = 'L' andnq = n if side = 'R'.

Q is defined as the product of nq-1 elementary reflectors, as returned by p?hetrd.

If uplo = 'U', Q = H(nq-1)... H(2) H(1);

If uplo = 'L', Q = H(1) H(2)... H(nq-1).

Input Parameters



(global) CHARACTER.uplo

1762


= 'U': Upper triangle of A(ia:*, ja:*) containselementary reflectors from p?hetrd;= 'L': Lower triangle of A(ia:*,ja:*) contains elementaryreflectors from p?hetrd



m



n

(local)aREAL for pcunmtrDOUBLE PRECISION for pzunmtr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side='L', or (lld_a,LOCc(ja+n-1)) if side = 'R'.Contains the vectors which define the elementary reflectors,as returned by p?hetrd.

If side='L', lld_a ≥ max(1,LOCr(ia+m-1));

If side ='R', lld_a ≥ max(1, LOCr(ia+n-1)).


ia, ja


desca

(local)tauCOMPLEX for pcunmtrDOUBLE COMPLEX for pzunmtr.Array, DIMENSION of ltau whereIf side = 'L' and uplo = 'U', ltau = LOCc(m_a),if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),if side = 'R' and uplo = 'U', ltau = LOCc(n_a),if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).tau(i) must contain the scalar factor of the elementaryreflector H(i), as returned by p?hetrd. tau is tied to thedistributed matrix A.

(local) COMPLEX for pcunmtrc

1763


DOUBLE COMPLEX for pzunmtr.Pointer into the local memory to an array of dimension(lld_a, LOCc (ja+n-1)). Contains the local pieces ofthe distributed matrix sub (C).


ic, jc


descc

(local)workCOMPLEX for pcunmtrDOUBLE COMPLEX for pzunmtr.Workspace array of dimension lwork.


lwork

If uplo = 'U',iaa= ia; jaa= ja+1, icc= ic; jcc= jc;else uplo = 'L',iaa= ia+1, jaa= ja;If side = 'L',icc= ic+1; jcc= jc;else icc= ic; jcc= jc+1;end ifend ifIf side = 'L',mi= m-1; ni= n

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +mpc0)*nb_a) + nb_a*nb_aelseIf side = 'R',mi= m; mi = n-1;

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0,NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a) +nb_a*nb_aend if

1764


where lcmq = lcm/NPCOL with lcm = ilcm(NPROW,NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),iroffc = mod(icc-1, mb_c),icoffc = mod(jcc-1, nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo. If lwork = -1,then lwork is global input and a workspace query isassumed; the routine only calculates the minimum andoptimal size for all work arrays. Each of these values isreturned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

Overwritten by the product Q*sub(C), or Q'*sub(C), orsub(C)*Q', or sub(C)*Q.

c


work(1)

(global) INTEGER.info= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - i

1765

p?stebzComputes the eigenvalues of a symmetrictridiagonal matrix by bisection.

Syntax

call psstebz(ictxt, range, order, n, vl, vu, il, iu, abstol, d, e, m, nsplit,w, iblock, isplit, work, iwork, liwork, info)

call pdstebz(ictxt, range, order, n, vl, vu, il, iu, abstol, d, e, m, nsplit,w, iblock, isplit, work, iwork, liwork, info)

Description

This routine computes the eigenvalues of a symmetric tridiagonal matrix in parallel. These maybe all eigenvalues, all eigenvalues in the interval [v1 vu], or the eigenvalues indexed ilthrough iu. A static partitioning of work is done at the beginning of p?stebz which results inall processes finding an (almost) equal number of eigenvalues.

Input Parameters

(global) INTEGER. The BLACS context handle.ictxt

(global) CHARACTER. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalues in theinterval [v1 vu].If range ='I', the routine computes eigenvalues withindices il to iu.

(global) CHARACTER. Must be 'B' or 'E'.orderIf order = 'B', the eigenvalues are to be ordered fromsmallest to largest within each split-off block.If order = 'E', the eigenvalues for the entire matrix areto be ordered from smallest to largest.

(global) INTEGER. The order of the tridiagonal matrix T

(n≥0).

n

(global)vl, vuREAL for psstebzDOUBLE PRECISION for pdstebz.

1766


If range = 'V', the routine computes the lower and theupper bounds for the eigenvalues on the interval [1, vu].If range = 'A' or 'I', vl and vu are not referenced.

(global)il, iuINTEGER. Constraint: 1≤il≤iu≤n.If range = 'I', the index of the smallest eigenvalue isreturned for il and of the largest eigenvalue for iu(assuming that the eigenvalues are in ascending order)must be returned. il must be at least 1. iu must be at leastil and no greater than n.If range = 'A' or 'V', il and iu are not referenced.

(global)abstolREAL for psstebzDOUBLE PRECISION for pdstebz.The absolute tolerance to which each eigenvalue is required.An eigenvalue (or cluster) is considered to have converged

if it lies in an interval of width abstol. If abstol≤0, thenthe tolerance is taken as ulp||T||, where ulp is themachine precision, and ||T|| means the 1-norm of TEigenvalues will be computed most accurately when abstolis set to the underflow threshold slamch('U'), not 0. Notethat if eigenvectors are desired later by inverse iteration(p?stein), abstol should be set to 2*p?lamch('S').

(global)dREAL for psstebzDOUBLE PRECISION for pdstebz.Array, DIMENSION (n).Contains n diagonal elements of the tridiagonal matrix T.To avoid overflow, the matrix must be scaled so that itslargest entry is no greater than the overflow(1/2) *underflow(1/4) in absolute value, and for greatest accuracy,it should not be much smaller than that.

(global)eREAL for psstebzDOUBLE PRECISION for pdstebz.Array, DIMENSION (n - 1).

1767


Contains (n-1) off-diagonal elements of the tridiagonalmatrix T. To avoid overflow, the matrix must be scaled sothat its largest entry is no greater than overflow(1/2) *underflow(1/4) in absolute value, and for greatest accuracy,it should not be much smaller than that.

(local)workREAL for psstebzDOUBLE PRECISION for pdstebz.Array, DIMENSION max(5n, 7). This is a workspace array.

(local) INTEGER. The size of the work array must be ≥max(5n, 7).

lwork


(local) INTEGER. Array, DIMENSION max(4n, 14). This isa workspace array.

iwork

(local) INTEGER. the size of the iwork array must ≥max(4n, 14, NPROCS).

liwork

If liwork = -1, then liwork is global input and aworkspace query is assumed; the routine only calculatesthe minimum and optimal size for all work arrays. Each ofthese values is returned in the first entry of thecorresponding work array, and no error message is issuedby pxerbla.

Output Parameters

(global) INTEGER. The actual number of eigenvalues found.

0≤m≤n

m

(global) INTEGER. The number of diagonal blocks detected

in T. 1≤nsplit≤n

nsplit

(global)wREAL for psstebzDOUBLE PRECISION for pdstebz.

1768


Array, DIMENSION (n). On exit, the first m elements of wcontain the eigenvalues on all processes.

(global) INTEGER.iblockArray, DIMENSION (n). At each row/column j where e(j) iszero or small, the matrix T is considered to split into a blockdiagonal matrix. On exit iblock(i) specifies which block(from 1 to the number of blocks) the eigenvalue w(i)belongs to.

NOTE. In the (theoretically impossible) event thatbisection does not converge for some or alleigenvalues, info is set to 1 and the ones for whichit did not are identified by a negative block number.

(global) INTEGER.isplitArray, DIMENSION (n).Contains the splitting points, at which T breaks up intosubmatrices. The first submatrix consists of rows/columns1 to isplit(1), the second of rows/columns isplit(1)+1through isplit(2), etc., and the nsplit-th consists ofrows/columns isplit(nsplit-1)+1 throughisplit(nsplit)=n. (Only the first nsplit elements areused, but since the nsplit values are not known, n wordsmust be reserved for isplit.)

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0, if info = -i, the i-th argument has an illegalvalue.If info > 0, some or all of the eigenvalues fail to convergeor not computed.If info = 1, bisection fails to converge for someeigenvalues; these eigenvalues are flagged by a negativeblock number. The effect is that the eigenvalues may notbe as accurate as the absolute and relative tolerances.If info = 2, mismatch between the number of eigenvaluesoutput and the number desired.

1769

If info = 3: range='i', and the Gershgorin intervalinitially used is incorrect. No eigenvalues are computed.Probable cause: the machine has a sloppy floating pointarithmetic. Increase the fudge parameter, recompile, andtry again.

p?steinComputes the eigenvectors of a tridiagonal matrixusing inverse iteration.

Syntax

call psstein(n, d, e, m, w, iblock, isplit, orfac, z, iz, jz, descz, work,lwork, iwork, liwork, ifail, iclustr, gap, info)

call pdstein(n, d, e, m, w, iblock, isplit, orfac, z, iz, jz, descz, work,lwork, iwork, liwork, ifail, iclustr, gap, info)

call pcstein(n, d, e, m, w, iblock, isplit, orfac, z, iz, jz, descz, work,lwork, iwork, liwork, ifail, iclustr, gap, info)

call pzstein(n, d, e, m, w, iblock, isplit, orfac, z, iz, jz, descz, work,lwork, iwork, liwork, ifail, iclustr, gap, info)

Description

This routine computes the eigenvectors of a symmetric tridiagonal matrix T corresponding tospecified eigenvalues, by inverse iteration. p?stein does not orthogonalize vectors that areon different processes. The extent of orthogonalization is controlled by the input parameterlwork. Eigenvectors that are to be orthogonalized are computed by the same process. p?steindecides on the allocation of work among the processes and then calls ?stein2 (modified LAPACKroutine) on each individual process. If insufficient workspace is allocated, the expectedorthogonalization may not be done.

NOTE. If the eigenvectors obtained are not orthogonal, increase lwork and run the codeagain.

p = NPROW * NPCOL is the total number of processes.

1770


Input Parameters

(global) INTEGER. The order of the matrix T (n ≥ 0).n

(global) INTEGER. The number of eigenvectors to bereturned.

m

(global)d, e, wREAL for single-precision flavorsDOUBLE PRECISION for double-precision flavors.Arrays: d(*) contains the diagonal elements of T.DIMENSION (n).e(*) contains the off-diagonal elements of T.DIMENSION (n-1).w(*) contains all the eigenvalues grouped by split-offblock.The eigenvalues are supplied from smallest to largestwithin the block. (Here the output array w from p?stebzwith order = 'B' is expected. The array should be replicatedin all processes.)DIMENSION(m)

(global) INTEGER.iblockArray, DIMENSION (n). The submatrix indices associatedwith the corresponding eigenvalues in w--1 for eigenvaluesbelonging to the first submatrix from the top, 2 for thosebelonging to the second submatrix, etc. (The output arrayiblock from p?stebz is expected here).

(global) INTEGER.isplitArray, DIMENSION (n). The splitting points, at which T breaksup into submatrices. The first submatrix consists ofrows/columns 1 to isplit(1), the second of rows/columnsisplit(1)+1 through isplit(2), etc., and the nsplit-thconsists of rows/columns isplit(nsplit-1)+1 throughisplit(nsplit)=n . (The output array isplit fromp?stebz is expected here.)

(global)orfacREAL for single-precision flavorsDOUBLE PRECISION for double-precision flavors.

1771


orfac specifies which eigenvectors should be orthogonalized.Eigenvectors that correspond to eigenvalues withinorfac*||T|| of each other are to be orthogonalized.However, if the workspace is insufficient (see lwork), thistolerance may be decreased until all eigenvectors can bestored in one process. No orthogonalization is done if orfacis equal to zero. A default value of 1000 is used if orfac isnegative. orfac should be identical on all processes

(global) INTEGER. The row and column indices in the globalarray z indicating the first row and the first column of thesubmatrix Z, respectively.

iz, jz

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix Z.

descz

(local). REAL for single-precision flavorsworkDOUBLE PRECISION for double-precision flavors.Workspace array, DIMENSION (lwork).

(local) INTEGER.lworklwork controls the extent of orthogonalization which can bedone. The number of eigenvectors for which storage isallocated on each process is nvec =floor((lwork-max(5*n,np00*mq00))/n). Eigenvectorscorresponding to eigenvalue clusters of size nvec -ceil(m/p) + 1 are guaranteed to be orthogonal (theorthogonality is similar to that obtained from ?stein2).

NOTE. lwork must be no smaller thanmax(5*n,np00*mq00) + ceil(m/p)*n and shouldhave the same input value on all processes.

It is the minimum value of lwork input on differentprocesses that is significant.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

1772


(local) INTEGER.iworkWorkspace array, DIMENSION (3n+p+1).

(local) INTEGER. The size of the array iwork. It must be ≥3*n + p + 1.

liwork


Output Parameters

(local)zREAL for pssteinDOUBLE PRECISION for pdsteinCOMPLEX for pcsteinDOUBLE COMPLEX for pzstein.Array, DIMENSION (descz(dlen_), n/NPCOL + NB). zcontains the computed eigenvectors associated with thespecified eigenvalues. Any vector which fails to converge isset to its current iterate after MAXIT iterations (See?stein2). On output, z is distributed across the p processesin block cyclic format.

On exit, work(1) gives a lower bound on the workspace(lwork) that guarantees the user desired orthogonalization(see orfac). Note that this may overestimate the minimumworkspace needed.

work(1)

On exit, iwork(1) contains the amount of integer workspacerequired.

iwork

On exit, the iwork(2) through iwork(p+2) indicate theeigenvectors computed by each process. Process i computeseigenvectors indexed iwork(i+2)+1 through iwork(i+3).

(global). INTEGER. Array, DIMENSION (m). On normal exit,all elements of ifail are zero. If one or more eigenvectorsfail to converge after MAXIT iterations (as in ?stein), then

ifail

info > 0 is returned. If mod(info, m+1)>0, then for i=1

1773


to mod(info,m+1), the eigenvector corresponding to theeigenvalue w(ifail(i)) failed to converge (w refers to thearray of eigenvalues on output).

(global) INTEGER. Array, DIMENSION (2*p)iclustrThis output array contains indices of eigenvectorscorresponding to a cluster of eigenvalues that could not beorthogonalized due to insufficient workspace (see lwork,orfac and info). Eigenvectors corresponding to clusters ofeigenvalues indexed iclustr(2*I-1) to iclustr(2*I),i = 1 to info/(m+1), could not be orthogonalized due tolack of workspace. Hence the eigenvectors correspondingto these * clusters may not be orthogonal. iclustr is azero terminated array---(iclustr(2*k).ne.0.and.iclustr(2*k+1).eq.0)if and only if k is the number of clusters.

(global)gapREAL for single-precision flavorsDOUBLE PRECISION for double-precision flavors.This output array contains the gap between eigenvalueswhose eigenvectors could not be orthogonalized. The info/moutput values in this array correspond to the info/(m+1)clusters indicated by the array iclustr. As a result, the dotproduct between eigenvectors corresponding to the i-thcluster may be as high as (O(n)*macheps)/gap(i).

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0: If the i-th argument is an array and thej-entry had an illegal value, then info = -(i*100+j),If the i-th argument is a scalar and had an illegal value,then info = -i.If info < 0: if info = -i, the i-th argument had an illegalvalue.If info > 0: if mod(info, m+1) = i, then i eigenvectorsfailed to converge in MAXIT iterations. Their indices arestored in the array ifail. If info/(m+1) = i, theneigenvectors corresponding to i clusters of eigenvaluescould not be orthogonalized due to insufficient workspace.The indices of the clusters are stored in the array iclustr.

1774


This section describes ScaLAPACK routines for solving nonsymmetric eigenvalue problems,computing the Schur factorization of general matrices, as well as performing a number of relatedcomputational tasks.

To solve a nonsymmetric eigenvalue problem with ScaLAPACK, you usually need to reduce thematrix to the upper Hessenberg form and then solve the eigenvalue problem with the Hessenbergmatrix obtained.

Table 6-5 lists ScaLAPACK routines for reducing the matrix to the upper Hessenberg form byan orthogonal (or unitary) similarity transformation A = QHQH, as well as routines for solvingeigenproblems with Hessenberg matrices, and multiplying the matrix after reduction.

Table 6-5 Computational Routines for Solving Nonsymmetric Eigenproblems

HessenbergmatrixOrthogonal/Unitarymatrix

General matrixOperation performed

p?gehrdReduce to Hessenberg form A =QHQH

p?ormhr/p?unmhrMultiply the matrix after reduction

p?lahqrFind eigenvalues and Schurfactorization

p?gehrdReduces a general matrix to upper Hessenbergform.

Syntax

call psgehrd( n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pdgehrd( n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pcgehrd( n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pzgehrd( n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine reduces a real/complex general distributed matrix sub (A) to upper Hessenbergform H by an orthogonal or unitary similarity transformation

Q'*sub(A)*Q = H,

1775


where sub(A) = A(ia+n-1:ia+n-1, ja+n-1:ja+n-1).

Input Parameters


(n≥0).

n

(global) INTEGER.ilo, ihiIt is assumed that sub(A) is already upper triangular in rowsia:ia+ilo-2 and ia+ihi:ia+n-1 and columnsja:ja+ilo-2 and ja+ihi:ja+n-1. (See Application Notesbelow).

If n > 0, 1≤ilo≤ihi≤n; otherwise set ilo = 1, ihi =n.

(local) REAL for psgehrdaDOUBLE PRECISION for pdgehrdCOMPLEX for pcgehrdDOUBLE COMPLEX for pzgehrd.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the n-by-n general distributed matrixsub(A) to be reduced.


ia, ja


desca

(local)workREAL for psgehrdDOUBLE PRECISION for pdgehrdCOMPLEX for pcgehrdDOUBLE COMPLEX for pzgehrd.Workspace array of dimension lwork.

(local or global) INTEGER, dimension of the array work.lwork is local input and must be at least

lwork

lwork≥NB*NB + NB*max(ihip+1, ihlp+inlq)where NB = mb_a = nb_a,iroffa = mod(ia-1, NB),

1776


icoffa = mod(ja-1, NB),ioff = mod(ia+ilo-2, NB), iarow = indxg2p(ia,NB, MYROW, rsrc_a, NPROW), ihip =numroc(ihi+iroffa, NB, MYROW, iarow, NPROW),ilrow = indxg2p(ia+ilo-1, NB, MYROW, rsrc_a,NPROW),ihlp = numroc(ihi-ilo+ioff+1, NB, MYROW, ilrow,NPROW),ilcol = indxg2p(ja+ilo-1, NB, MYCOL, csrc_a,NPCOL),inlq = numroc(n-ilo+ioff+1, NB, MYCOL, ilcol,NPCOL),indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, the upper triangle and the first subdiagonal ofsub(A)are overwritten with the upper Hessenberg matrix H,and the elements below the first subdiagonal, with the arraytau, represent the orthogonal/unitary matrix Q as a productof elementary reflectors. (See Application Notes below).

a

(local). REAL for psgehrdtauDOUBLE PRECISION for pdgehrdCOMPLEX for pcgehrdDOUBLE COMPLEX for pzgehrd.Array, DIMENSION at least max(ja+n-2).The scalar factors of the elementary reflectors (seeApplication Notes below). Elements ja:ja+ilo-2 andja+ihi:ja+n-2 of tau are set to zero. tau is tied to thedistributed matrix A.

1777



work(1)


Application Notes

The matrix Q is represented as a product of (ihi-ilo) elementary reflectors

Q = H(ilo) H(ilo+1)... H(ihi-1).


H(i)= i - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i)= 0, v(i+1)=1 and v(ihi+1:n)= 0; v(i+2:ihi) is stored on exit in a(ia+ilo+i:ia+ihi-1,ja+ilo+i-2),and tau in tau(ja+ilo+i-2). The contents of a(ia:ia+n-1,ja:ja+n-1) are illustrated bythe following example, with n = 7, ilo = 2 and ihi = 6:

on entry

on exit

1778


where a denotes an element of the original matrix sub(A), H denotes a modified element of theupper Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).

p?ormhrMultiplies a general matrix by the orthogonaltransformation matrix from a reduction toHessenberg form determined by p?gehrd.

Syntax

call psormhr(side, trans, m, n, ilo, ihi, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

call pdormhr(side, trans, m, n, ilo, ihi, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

Description

The routine overwrites the general real distributed m-by-n matrix sub(C)= C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'



1779


where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq= n if side = 'R'.

Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.


Input Parameters




matrix sub (C) (m≥0).

m

(global) INTEGER. The number of columns in he distributed

matrix sub (C) (n≥0).

n

(global) INTEGER.ilo, ihiilo and ihi must have the same values as in the previouscall of p?gehrd. Q is equal to the unit matrix except for thedistributed submatrixQ(ia+ilo:ia+ihi-1,ia+ilo:ja+ihi-1).

If side = 'L', 1≤ilo≤ihi≤max(1,m);

If side = 'R', 1≤ilo≤ihi≤max(1,n);ilo and ihi are relative indexes.

(local)aREAL for psormhrDOUBLE PRECISION for pdormhrPointer into the local memory to an array of dimension(lld_a, LOCc(ja+m-1)) if side='L', and (lld_a,LOCc(ja+n-1)) if side = 'R'.Contains the vectors which define the elementary reflectors,as returned by p?gehrd.

1780



ia, ja


desca

(local)tauREAL for psormhrDOUBLE PRECISION for pdormhrArray, DIMENSION LOCc(ja+m-2), if side = 'L', andLOCc(ja+n-2) if side = 'R'.This array contains the scalar factors tau(j) of theelementary reflectors H(j) as returned by p?gehrd. tau istied to the distributed matrix A.

(local)cREAL for psormhrDOUBLE PRECISION for pdormhrPointer into the local memory to an array of dimension(lld_c,LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C).


ic, jc


descc

(local)workREAL for psormhrDOUBLE PRECISION for pdormhrWorkspace array with dimension lwork.

(local or global) INTEGER.lworkThe dimension of the array work.lwork must be at least iaa = ia + ilo; jaa =ja+ilo-1;If side = 'L',mi = ihi-ilo; ni = n; icc = ic + ilo; jcc = jc;

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a)+ nb_a*nb_a

1781


else if side = 'R',mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo;

lwork ≥ max((nb_a*(nb_a-1))/2,(nqc0+max(npa0+numroc(numroc(ni+icoffc, nb_a,0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a) +nb_a*nb_aend ifwhere lcmq = lcm/NPCOL with lcm = ilcm(NPROW,NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(icc-1, mb_c), icoffc = mod(jcc-1,nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

sub(C) is overwritten by Q*sub(C), or Q'*sub(C), orsub(C)*Q', or sub(C)*Q.

c


work(1)


1782


= 0: the execution is successful.< 0: if the i-th argument is an array and the j-entry hadan illegal value, then info = - (i* 100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

p?unmhrMultiplies a general matrix by the unitarytransformation matrix from a reduction toHessenberg form determined by p?gehrd.

Syntax

call pcunmhr(side, trans, m, n, ilo, ihi, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

call pzunmhr(side, trans, m, n, ilo, ihi, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

Description

The routine overwrites the general complex distributed m-by-n matrix sub(C) =C(ic:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'


sub(C)*QHQH*sub(C)trans = 'H':

where Q is a complex unitary distributed matrix of order nq, with nq = m if side = 'L' andnq = n if side = 'R'.

Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.


Input Parameters


(global) CHARACTERtrans

1783

='N', no transpose, Q is applied.='C', conjugate transpose, QH is applied.


submatrix sub (C) (m≥0).

m


submatrix sub (C) (n≥0).

n

(global) INTEGERilo, ihiThese must be the same parameters ilo and ihi,respectively, as supplied to p?gehrd. Q is equal to the unitmatrix except in the distributed submatrix Q(ia+ilo:ia+ihi-1,ia+ilo:ja+ihi-1).

If side ='L', then 1≤ilo≤ihi≤max(1,m).

If side = 'R', then 1≤ilo≤ihi≤max(1,n)ilo and ihi are relative indexes.

(local)aCOMPLEX for pcunmhrDOUBLE COMPLEX for pzunmhr.Pointer into the local memory to an array of dimension(lld_a, LOC c(ja+m-1)) if side='L', and (lld_a,LOCc(ja+n-1)) if side = 'R'.Contains the vectors which define the elementary reflectors,as returned by p?gehrd.


ia, ja


desca

(local)tauCOMPLEX for pcunmhrDOUBLE COMPLEX for pzunmhr.Array, DIMENSION LOCc(ja+m-2), if side = 'L', andLOCc(ja+n-2) if side = 'R'.This array contains the scalar factors tau(j) of theelementary reflectors H(j) as returned by p?gehrd. tau istied to the distributed matrix A.

1784


(local)cCOMPLEX for pcunmhrDOUBLE COMPLEX for pzunmhr.Pointer into the local memory to an array of dimension(lld_c, LOCc(jc+n-1)).Contains the local pieces of the distributed matrix sub(C).


ic, jc


descc

(local)workCOMPLEX for pcunmhrDOUBLE COMPLEX for pzunmhr.Workspace array with dimension lwork.

(local or global)lworkThe dimension of the array work.lwork must be at least iaa = ia + ilo; jaa =ja+ilo-1;If side = 'L', mi = ihi-ilo; ni = n; icc = ic + ilo;

jcc = jc; lwork ≥ max((nb_a*(nb_a-1))/2,(nqc0+mpc0)*nb_a) + nb_a*nb_aelse if side = 'R',mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo;

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0,NPCOL), nb_a, 0, 0, lcmq ), mpc0))*nb_a) +nb_a*nb_aend ifwhere lcmq = lcm/NPCOL with lcm = ilcm(NPROW,NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(icc-1, mb_c),

1785


icoffc = mod(jcc-1, nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

C is overwritten by Q* sub(C) or Q'*sub(C) or sub(C)*Q' orsub(C)*Q.

c


work(1)


1786


p?lahqrComputes the Schur decomposition and/oreigenvalues of a matrix already in Hessenbergform.

Syntax

call pslahqr(wantt, wantz, n, ilo, ihi, a, desca, wr, wi, iloz, ihiz, z,descz, work, lwork, iwork, ilwork, info)

call pdlahqr(wantt, wantz, n, ilo, ihi, a, desca, wr, wi, iloz, ihiz, z,descz, work, lwork, iwork, ilwork, info)

Description

This is an auxiliary routine used to find the Schur decomposition and/or eigenvalues of a matrixalready in Hessenberg form from columns ilo to ihi.

Input Parameters

(global) LOGICALwanttIf wantt = .TRUE., the full Schur form T is required;If wantt = .FALSE., only eigenvalues are required.

(global) LOGICAL.wantzIf wantz = .TRUE., the matrix of Schur vectors z isrequired;If wantz = .FALSE., Schur vectors are not required.

(global) INTEGER. The order of the Hessenberg matrix A

(and z if wantz). (n≥0).

n

(global) INTEGER.ilo, ihiIt is assumed that A is already upper quasi-triangular inrows and columns ihi+1:n, and that A(ilo, ilo-1) = 0(unless ilo = 1). p?lahqr works primarily with theHessenberg submatrix in rows and columns ilo to ihi, butapplies transformations to all of h if wantt is .TRUE..

1≤ilo≤max(1,ihi); ihi ≤ n.

(global)aREAL for pslahqr

1787


DOUBLE PRECISION for pdlahqrArray, DIMENSION (desca(lld_),*). On entry, the upperHessenberg matrix A.


desca

(global) INTEGER. Specify the rows of z to whichtransformations must be applied if wantz is .TRUE..

1≤iloz≤ilo; ihi≤ihiz≤n.

iloz, ihiz

(global ) REAL for pslahqrzDOUBLE PRECISION for pdlahqrArray. If wantz is .TRUE., on entry z must contain thecurrent matrix Z of transformations accumulated bypdhseqr. If wantz is .FALSE., z is not referenced.


descz

(local)workREAL for pslahqrDOUBLE PRECISION for pdlahqrWorkspace array with dimension lwork.

(local) INTEGER. The dimension of work. lwork is assumed

big enough so that lwork≥3*n +max(2*max(descz(lld_),desca(lld_)) + 2*LOCq(n),7*ceil(n/hbl)/lcm(NPROW,NPCOL))).

lwork

If lwork = -1, then work(1)gets set to the above numberand the code returns immediately.

(global and local) INTEGER array of size ilwork.iwork

(local) INTEGER This holds some of the iblk integer arrays.ilwork

Output Parameters

On exit, if wantt is .TRUE., A is upper quasi-triangular inrows and columns ilo:ihi, with any 2-by-2 or largerdiagonal blocks not yet in standard form. If wantt is.FALSE., the contents of A are unspecified on exit.

a


work(1)

1788


(global replicated output)wr, wiREAL for pslahqrDOUBLE PRECISION for pdlahqrArrays, DIMENSION(n) each. The real and imaginary parts,respectively, of the computed eigenvalues ilo to ihi arestored in the corresponding elements of wr and wi. If twoeigenvalues are computed as a complex conjugate pair,they are stored in consecutive elements of wr and wi, saythe i-th and (i+1)-th, with wi(i)> 0 and wi(i+1) < 0. Ifwantt is .TRUE. , the eigenvalues are stored in the sameorder as on the diagonal of the Schur form returned in A. Amay be returned with larger diagonal blocks until the nextrelease.

On exit z has been updated; transformations are appliedonly to the submatrix z(iloz:ihiz, ilo:ihi).

z

(global) INTEGER.info= 0: the execution is successful.< 0: parameter number -info incorrect or inconsistent> 0: p?lahqr failed to compute all the eigenvalues ilo toihi in a total of 30*(ihi-ilo+1) iterations; if info = i,elements i+1:ihi of wr and wi contain those eigenvalueswhich have been successfully computed.


This section describes ScaLAPACK routines for computing the singular value decomposition(SVD) of a general m-by-n matrix A (see “Singular Value Decomposition” in LAPACK chapter).

To find the SVD of a general matrix A, this matrix is first reduced to a bidiagonal matrix B bya unitary (orthogonal) transformation, and then SVD of the bidiagonal matrix is computed.Note that the SVD of B is computed using the LAPACK routine ?bdsqr .

Table 6-6 lists ScaLAPACK computational routines for performing this decomposition.

Table 6-6 Computational Routines for Singular Value Decomposition (SVD)

Orthogonal/unitary matrixGeneral matrixOperation

p?gebrdReduce A to a bidiagonal matrix

p?ormbr/p?unmbrMultiply matrix after reduction

1789

p?gebrdReduces a general matrix to bidiagonal form.

Syntax

call psgebrd(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pdgebrd(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pcgebrd(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pzgebrd(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

Description

The routine reduces a real/complex general m-by-n distributed matrix sub(A)=A(ia:ia+m-1,ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitarytransformation:

Q'*sub(A)*P = B.

If m≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.

Input Parameters


matrix sub(A) (m≥0).

m


matrix sub(A) (n≥0).

n

(local)aREAL for psgebrdDOUBLE PRECISION for pdgebrdCOMPLEX for pcgebrdDOUBLE COMPLEX for pzgebrd.Real pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe distributed matrix sub (A).


ia, ja

1790

desca

(local)workREAL for psgebrdDOUBLE PRECISION for pdgebrdCOMPLEX for pcgebrdDOUBLE COMPLEX for pzgebrd. Workspace array ofdimension lwork.


lwork

lwork ≥ nb*(mpa0 + nqa0+1)+ nqa0where nb = mb_a = nb_a,iroffa = mod(ia-1, nb),icoffa = mod(ja-1, nb),iarow = indxg2p(ia, nb, MYROW, rsrc_a, NPROW),iacol = indxg2p (ja, nb, MYCOL, csrc_a, NPCOL),mpa0 = numroc(m +iroffa, nb, MYROW, iarow,NPROW),nqa0 = numroc(n +icoffa, nb, MYCOL, iacol,NPCOL),indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, if m≥n, the diagonal and the first superdiagonal ofsub(A) are overwritten with the upper bidiagonal matrix B;the elements below the diagonal, with the array tauq,

a

represent the orthogonal/unitary matrix Q as a product ofelementary reflectors, and the elements above the firstsuperdiagonal, with the array taup, represent the orthogonalmatrix P as a product of elementary reflectors. If m < n,

1791

the diagonal and the first subdiagonal are overwritten withthe lower bidiagonal matrix B; the elements below the firstsubdiagonal, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors, and the elements above the diagonal, with thearray taup, represent the orthogonal matrix P as a productof elementary reflectors. (See Application Notes below)

(local)dREAL for single-precision flavorsDOUBLE PRECISION for double-precision flavors. Array,

DIMENSION LOCc(ja+min(m,n)-1) if m≥n;LOCr(ia+min(m,n)-1) otherwise. The distributed diagonalelements of the bidiagonal matrix B: d(i) = a(i,i).d is tied to the distributed matrix A.

(local)eREAL for single-precision flavorsDOUBLE PRECISION for double-precision flavors. Array,

DIMENSION LOCr(ia+min(m,n)-1) if m≥n;LOCc(ja+min(m,n)-2) otherwise. The distributedoff-diagonal elements of the bidiagonal distributed matrixB:

If m≥n, e(i) = a(i,i+1) for i = 1,2,..., n-1; if m< n, e(i) = a(i+1, i) for i = 1,2,...,m-1. e is tiedto the distributed matrix A.

(local)tauq, taupREAL for psgebrdDOUBLE PRECISION for pdgebrdCOMPLEX for pcgebrdDOUBLE COMPLEX for pzgebrd.Arrays, DIMENSION LOCc(ja+min(m,n)-1) for tauq andLOCr(ia+min(m,n)-1) for taup. Contain the scalar factorsof the elementary reflectors which represent theorthogonal/unitary matrices Q and P, respectively. tauq andtaup are tied to the distributed matrix A. (See ApplicationNotes below)

1792

work(1)


Application Notes

The matrices Q and P are represented as products of elementary reflectors:

If m ≥ n,

Q = H(1) H(2)... H(n) and P = G(1) G(2)... G(n-1).

Each H(i) and G(i) has the form:

H(i)= i - tauq * v * v' and G(i) = i - taup * u * u'

where tauq and taup are real/complex scalars, and v and u are real/complex vectors;

v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in A(ia+i:ia+m-1,ja+i-1);

u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A (ia+i-1,ja+i+1:ja+n-1);

tauq is stored in tauq(ja+i-1) and taup in taup(ia+i-1).

If m < n,

Q = H(1) H(2)... H (m-1) and P = G (1) G(2)... G (m)

Each H (i) and G(i) has the form:

H(i)= i - tauq * v * v' and G(i)= i - taup * u * u'

here tauq and taup are real/complex scalars, and v and u are real/complex vectors;

v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A (ia+i:ia+m-1,ja+i-1);u(1:i-1) = 0, u(i) = 1, and u(i+1:n) is stored on exit in A(ia+i-1,ja+i+1:ja+n-1);

tauq is stored in tauq(ja+i-1) and taup in taup(ia+i-1).

The contents of sub(A) on exit are illustrated by the following examples:

m = 6 and n = 5 (m > n):

1793

m = 5 and n = 6 (m < n):

where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of thevector defining H(i), and ui an element of the vector defining G(i).

1794

p?ormbrMultiplies a general matrix by one of the orthogonalmatrices from a reduction to bidiagonal formdetermined by p?gebrd.

Syntax

call psormbr(vect, side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

call pdormbr(vect, side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

Description

If vect = 'Q', the routine overwrites the general real distributed m-by-n matrix sub(C) =C(c:ic+m-1,jc:jc+n-1) with

side ='R'side ='L'

sub(C) QQ sub(C)trans = 'N':

sub(C) QTQT sub(C)trans = 'T':

If vect = 'P', the routine overwrites sub(C) with

side ='R'side ='L'

sub(C) PP sub(C)trans = 'N':

sub(C) PTPT sub(C)trans = 'T':

Here Q and PT are the orthogonal distributed matrices determined by p?gebrd when reducinga real distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*,ja:*) = Q*B*PT. Q andPT are defined as products of elementary reflectors H(i) and G(i) respectively.

Let nq = m if side = 'L' and nq = n if side = 'R'. Thus nq is the order of the orthogonalmatrix Q or PT that is applied.

If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:

If nq ≥ k, Q = H(1) H(2)...H(k);

If nq < k, Q = H(1) H(2)...H(nq-1).

If vect = 'P', A(ia:*, ja:*) is assumed to have been a k-by-nq matrix:

1795

If k < nq, P = G(1) G(2)...G(k);

If k ≥ nq, P = G(1) G(2)...G(nq-1).

Input Parameters

(global) CHARACTER.vectIf vect ='Q', then Q or QT is applied.If vect ='P', then P or PT is applied.

(global) CHARACTER.sideIf side ='L', then Q or QT, P or PT is applied from the left.If side ='R', then Q or QT, P or PT is applied from the right.

(global) CHARACTER.transIf trans = 'N', no transpose, Q or P is applied.If trans = 'T', then QT or PT is applied.

(global) INTEGER. The number of rows in the distributedmatrix sub (C).

m

(global) INTEGER. The number of columns in the distributedmatrix sub (C).

n

(global) INTEGER.kIf vect = 'Q', the number of columns in the originaldistributed matrix reduced by p?gebrd;If vect = 'P', the number of rows in the originaldistributed matrix reduced by p?gebrd.

Constraints: k ≥ 0.

(local)aREAL for psormbrDOUBLE PRECISION for pdormbr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+min(nq,k)-1))If vect='Q', and (lld_a, LOCc(ja+nq-1))If vect = 'P'.nq = m if side = 'L', and nq = n otherwise.The vectors which define the elementary reflectors H(i) andG(i), whose products determine the matrices Q and P, asreturned by p?gebrd.

If vect = 'Q', lld_a≥max(1, LOCr(ia+nq-1));

1796

If vect = 'P', lld_a≥max(1, LOCr(ia+min(nq, k)-1)).


ia, ja


desca

(local)tauREAL for psormbrDOUBLE PRECISION for pdormbr.Array, DIMENSION LOCc(ja+min(nq, k)-1), if vect ='Q', and LOCr(ia+min(nq, k)-1), if vect = 'P'.tau(i) must contain the scalar factor of the elementaryreflector H(i) or G(i), which determines Q or P, as returnedby pdgebrd in its array argument tauq or taup. tau is tiedto the distributed matrix A.

(local) REAL for psormbrcDOUBLE PRECISION for pdormbrPointer into the local memory to an array of dimension(lld_a, LOCc (jc+n-1)).Contains the local pieces of the distributed matrix sub (C).


ic, jc


descc

(local)workREAL for psormbrDOUBLE PRECISION for pdormbr.Workspace array of dimension lwork.


lwork

If side = 'L'nq = m;

if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q'and nq>k)), iaa=ia; jaa=ja; mi=m; ni=n; icc=ic;jcc=jc;

1797


elseiaa= ia+1; jaa=ja; mi=m-1; ni=n; icc=ic+1; jcc= jc;end ifelseIf side = 'R', nq = n;

if((vect = 'Q' and nq≥k) or (vect is not equalto 'Q' and nq>k)),iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;elseiaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc=jc+1;end ifend ifIf vect = 'Q',

If side = 'L', lwork≥max((nb_a*(nb_a-1))/2, (nqc0 +mpc0)*nb_a) + nb_a * nb_aelse if side = 'R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0 +numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL),nb_a, 0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_aend ifelse if vect is not equal to 'Q', if side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0 +numroc(numroc(mi+iroffc, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_aelse if side = 'R',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + nqc0)*mb_a)+ mb_a*mb_aend ifend ifwhere lcmp = lcm/NPROW, lcmq = lcm/NPCOL, withlcm = ilcm(NPROW, NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),

1798


mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol,NPCOL),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow,NPROW),iroffc = mod(icc-1, mb_c),icoffc = mod(jcc-1, nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, if vect='Q', sub(C) is overwritten by Q*sub(C), orQ'*sub(C), or sub(C)*Q', or sub(C)*Q; if vect='P', sub(C)is overwritten by P*sub(C), or P'*sub(C), or sub(C)*P, orsub(C)*P'.

c


work(1)


1799


p?unmbrMultiplies a general matrix by one of the unitarytransformation matrices from a reduction tobidiagonal form determined by p?gebrd.

Syntax

call pcunmbr(vect, side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

call pzunmbr(vect, side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc,descc, work, lwork, info)

Description

If vect = 'Q', the routine overwrites the general complex distributed m-by-n matrix sub(C)= C(ic:ic+m-1, jc:jc+n-1) with

side ='R'side ='L'



If vect = 'P', the routine overwrites sub(C) with

side ='R'side ='L'

sub(C)*PP*sub(C)trans = 'N':

sub(C)*PHPH*sub(C)trans = 'C':

Here Q and PH are the unitary distributed matrices determined by p?gebrd when reducing acomplex distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*,ja:*) = Q*B*PH.

Q and PH are defined as products of elementary reflectors H(i) and G(i) respectively.

Let nq = m if side = 'L' and nq = n if side = 'R'. Thus nq is the order of the unitarymatrix Q or PH that is applied.

If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:

If nq ≥ k, Q = H(1) H(2)... H(k);

If nq < k, Q = H(1) H(2)... H(nq-1).

If vect = 'P', A(ia:*, ja:*) is assumed to have been a k-by-nq matrix:

1800

If k < nq, P = G(1) G(2)... G(k);

If k ≥ nq, P = G(1) G(2)... G(nq-1).

Input Parameters

(global) CHARACTER.vectIf vect ='Q', then Q or QH is applied.If vect ='P', then P or PH is applied.

(global) CHARACTER.sideIf side ='L', then Q or QH, P or PH is applied from the left.If side ='R', then Q or QH, P or PH is applied from the right.

(global) CHARACTER.transIf trans = 'N', no transpose, Q or P is applied.If trans = 'C', conjugate transpose, QH or PH is applied.


matrix sub (C) m≥0.

m


matrix sub (C) n≥0.

n

(global) INTEGER.kIf vect = 'Q', the number of columns in the originaldistributed matrix reduced by p?gebrd;If vect = 'P', the number of rows in the originaldistributed matrix reduced by p?gebrd.

Constraints: k ≥ 0.

(local)aCOMPLEX for psormbrDOUBLE COMPLEX for pdormbr.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+min(nq,k)-1)) if vect='Q', and(lld_a, LOCc(ja+nq-1)) if vect = 'P'.nq = m if side = 'L', and nq = n otherwise.The vectors which define the elementary reflectors H(i) andG(i), whose products determine the matrices Q and P, asreturned by p?gebrd.

If vect = 'Q', lld_a ≥ max(1, LOCr(ia+nq-1));

1801

If vect = 'P', lld_a ≥ max(1, LOCr(ia+min(nq,k)-1)).


ia, ja


desca

(local)tauCOMPLEX for pcunmbrDOUBLE COMPLEX for pzunmbr.Array, DIMENSION LOCc(ja+min(nq, k)-1), if vect ='Q', and LOCr(ia+min(nq, k)-1), if vect = 'P'.tau(i) must contain the scalar factor of the elementaryreflector H(i) or G(i), which determines Q or P, as returnedby p?gebrd in its array argument tauq or taup. tau is tiedto the distributed matrix A.

(local) COMPLEX for pcunmbrcDOUBLE COMPLEX for pzunmbrPointer into the local memory to an array of dimension(lld_a, LOCc (jc+n-1)).Contains the local pieces of the distributed matrix sub (C).


ic, jc


descc

(local)workCOMPLEX for pcunmbrDOUBLE COMPLEX for pzunmbr.Workspace array of dimension lwork.


lwork

If side = 'L'nq = m;

1802


if ((vect = 'Q' and nq ≥ k) or (vect is not equalto 'Q' and nq > k)), iaa= ia; jaa= ja; mi= m; ni=n; icc= ic; jcc= jc;elseiaa= ia+1; jaa= ja; mi= m-1; ni= n; icc= ic+1; jcc=jc;end ifelseIf side = 'R', nq = n;

if ((vect = 'Q' and nq ≥ k) or (vect is not equal

to 'Q' and nq ≥ k)),iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;elseiaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc=jc+1;end ifend ifIf vect = 'Q',

If side = 'L', lwork ≥ max((nb_a*(nb_a-1))/2,(nqc0+mpc0)*nb_a) + nb_a*nb_aelse if side = 'R',

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0,NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a) +nb_a*nb_aend ifelse if vect is not equal to 'Q',if side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +max(mqa0+numroc(numroc(mi+iroffc, mb_a, 0, 0,NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a) +mb_a*mb_aelse if side = 'R',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +nqc0)*mb_a) + mb_a*mb_aend if

1803


end ifwhere lcmp = lcm/NPROW, lcmq = lcm/NPCOL, with lcm= ilcm(NPROW, NPCOL),iroffa = mod(iaa-1, mb_a),icoffa = mod(jaa-1, nb_a),iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol, NPCOL),npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),iroffc = mod(icc-1, mb_c),icoffc = mod(jcc-1, nb_c),icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow,NPROW),nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol,NPCOL),indxg2p and numroc are ScaLAPACK tool functions; MYROW,MYCOL, NPROW and NPCOL can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, if vect='Q', sub(C) is overwritten by Q*sub(C),or Q'*sub(C), or sub(C)*Q', or sub(C)*Q; if vect='P',sub(C) is overwritten by P*sub(C), or P'*sub(C), orsub(C)*P, or sub(C)*P'.

c


work(1)


1804



Generalized Symmetric-Definite Eigen Problems

This section describes ScaLAPACK routines that allow you to reduce the generalizedsymmetric-definite eigenvalue problems (see Generalized Symmetric-Definite Eigenvalue

Problems in LAPACK chapters) to standard symmetric eigenvalue problem Cy = λy, which youcan solve by calling ScaLAPACK routines described earlier in this chapter (see SymmetricEigenproblems).

Table 6-7 lists these routines.

Table 6-7 Computational Routines for Reducing Generalized Eigenproblems to StandardProblems

Complex Hermitian matricesReal symmetric matricesOperation

p?hegstp?sygstReduce to standard problems

p?sygstReduces a real symmetric-definite generalizedeigenvalue problem to the standard form.

Syntax

call pssygst( ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, scale,info )

call pdsygst( ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, scale,info )

Description

This routine reduces real symmetric-definite generalized eigenproblems to the standard form.

In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotesB(ib:ib+n-1, jb:jb+n-1).

If ibtype = 1, the problem is

sub(A)*x = λ*sub(B)*x,

1805


and sub(A) is overwritten by inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT).

If ibtype = 2 or 3, the problem is

sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x,

and sub(A) is overwritten by U*sub(A)*UT, or LT*sub(A)*L.

sub(B) must have been previously factorized as UT*U or L*LT by p?potrf.

Input Parameters

(global) INTEGER. Must be 1 or 2 or 3.ibtypeIf itype = 1, compute inv(UT)*sub(A)*inv(U), orinv(L)*sub(A)*inv(LT);If itype = 2 or 3, compute U*sub(A)*UT, orLT*sub(A)*L.

(global) CHARACTER. Must be 'U' or 'L'.uploIf uplo = 'U', the upper triangle of sub(A) is stored andsub (B) is factored as UT*U.If uplo = 'L', the lower triangle of sub(A) is stored andsub (B) is factored as L*LT.

(global) INTEGER. The order of the matrices sub (A) and

sub (B) (n ≥ 0).

n

(local)aREAL for pssygstDOUBLE PRECISION for pdsygst.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, the array contains thelocal pieces of the n-by-n symmetric distributed matrixsub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andits strictly upper triangular part is not referenced.

1806


ia, ja


desca

(local)bREAL for pssygstDOUBLE PRECISION for pdsygst.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+n-1)). On entry, the array contains thelocal pieces of the triangular factor from the Choleskyfactorization of sub (B) as returned by p?potrf.


ib, jb


descb

Output Parameters

On exit, if info = 0, the transformed matrix, stored in thesame format as sub(A).

a

(global)scaleREAL for pssygstDOUBLE PRECISION for pdsygst.Amount by which the eigenvalues should be scaled tocompensate for the scaling performed in this routine. Atpresent, scale is always returned as 1.0, it is returned hereto allow for future enhancement.

(global) INTEGER.infoIf info = 0, the execution is successful. If info < 0, if thei-th argument is an array and the j-entry had an illegalvalue, then info = -(i*100+j), if the i-th argument isa scalar and had an illegal value, then info = -i.

1807

p?hegstReduces a Hermitian-definite generalizedeigenvalue problem to the standard form.

Syntax

call pchegst( ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, scale,info)

call pzhegst( ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, scale,info)

Description

This routine reduces complex Hermitian-definite generalized eigenproblems to the standardform.

In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotesB(ib:ib+n-1, jb:jb+n-1).



and sub(A) is overwritten by inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH).


sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x,

and sub(A) is overwritten by U*sub(A)*UH, or LH*sub(A)*L.

sub(B) must have been previously factorized as UH*U or L*LH by p?potrf.

Input Parameters

(global) INTEGER. Must be 1 or 2 or 3.ibtypeIf itype = 1, compute inv(UH)*sub(A)*inv(U), orinv(L)*sub(A)*inv(LH);If itype = 2 or 3, compute U*sub(A)*UH, or LH*sub(A)*L.

(global) CHARACTER. Must be 'U' or 'L'.uploIf uplo = 'U', the upper triangle of sub(A) is stored andsub (B) is factored as UH*U.

1808


If uplo = 'L', the lower triangle of sub(A) is stored andsub (B) is factored as L*LH.

(global) INTEGER. The order of the matrices sub (A) and

sub (B) (n≥0).

n

(local)aCOMPLEX for pchegstDOUBLE COMPLEX for pzhegst.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, the array contains thelocal pieces of the n-by-n Hermitian distributed matrixsub(A). If uplo = 'U', the leading n-by-n upper triangularpart of sub(A) contains the upper triangular part of thematrix, and its strictly lower triangular part is notreferenced. If uplo = 'L', the leading n-by-n lowertriangular part of sub(A) contains the lower triangular partof the matrix, and its strictly upper triangular part is notreferenced.


ia, ja


desca

(local)bCOMPLEX for pchegstDOUBLE COMPLEX for pzhegst.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+n-1)). On entry, the array contains thelocal pieces of the triangular factor from the Choleskyfactorization of sub (B) as returned by p?potrf.


ib, jb


descb

1809


Output Parameters

On exit, if info = 0, the transformed matrix, stored in thesame format as sub(A).

a

(global)scaleREAL for pchegstDOUBLE PRECISION for pzhegst.Amount by which the eigenvalues should be scaled tocompensate for the scaling performed in this routine. Atpresent, scale is always returned as 1.0, it is returned hereto allow for future enhancement.

(global) INTEGER.infoIf info = 0, the execution is successful. If info <0, if thei-th argument is an array and the j-entry had an illegalvalue, then info = -(i100+j), if the i-th argument is ascalar and had an illegal value, then info = -i.

Driver RoutinesTable 6-8 lists ScaLAPACK driver routines available for solving systems of linear equations,linear least-squares problems, standard eigenvalue and singular value problems, and generalizedsymmetric definite eigenproblems.

Table 6-8 ScaLAPACK Driver Routines

DriverMatrix type, storage schemeType of Problem

p?gesv (simple driver)p?gesvx(expert driver)

general (partial pivoting)Linear equations

p?gbsv (simple driver)general band (partial pivoting)

p?dbsv (simple driver)general band (no pivoting)

p?dtsv (simple driver)general tridiagonal (no pivoting)

p?posv (simple driver)p?posvx(expert driver)

symmetric/Hermitianpositive-definite

p?pbsv (simple driver)symmetric/Hermitianpositive-definite, band

p?ptsv (simple driver)symmetric/Hermitianpositive-definite, tridiagonal

p?gelsgeneral m-by-nLinear least squares problem

p?syev (simple driver) p?syevx /p?heevx (expert driver)

symmetric/HermitianSymmetric eigenvalueproblem

1810

DriverMatrix type, storage schemeType of Problem

p?gesvdgeneral m-by-nSingular value decomposition

p?sygvx / p?hegvx (expert driver)symmetric/Hermitian, one matrixalso positive-definite

Generalized symmetricdefinite eigenvalue problem

p?gesvComputes the solution to the system of linearequations with a square distributed matrix andmultiple right-hand sides.

Syntax

call psgesv(n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pdgesv(n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pcgesv(n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

call pzgesv(n, nrhs, a, ia, ja, desca, ipiv, b, ib, jb, descb, info)

Description

The routine p?gesv computes the solution to a real or complex system of linear equationssub(A)*X = sub(B), where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n distributedmatrix and X and sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1) are n-by-nrhs distributed matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) assub(A) = P*L*U, where P is a permutation matrix, L is unit lower triangular, and U is uppertriangular. L and U are stored in sub(A). The factored form of sub(A) is then used to solve thesystem of equations sub(A)*X = sub(B).

Input Parameters


sub(A) (n ≥ 0).

n

(global) INTEGER. The number of right hand sides, that is,the number of columns of the distributed submatrices B and

X (nrhs ≥ 0).

nrhs

(local)a, bREAL for psgesv

1811


DOUBLE PRECISION for pdgesvCOMPLEX for pcgesvDOUBLE COMPLEX for pzgesv.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) andb(lld_b,LOCc(jb+nrhs-1)), respectively.On entry, the array a contains the local pieces of the n-by-ndistributed matrix sub(A) to be factored.On entry, the array b contains the right hand side distributedmatrix sub(B).

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column ofsub(A), respectively.

ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray B indicating the first row and the first column ofsub(B), respectively.

ib, jb


descb

Output Parameters

Overwritten by the factors L and U from the factorizationsub(A) = P*L*U; the unit diagonal elements of L are notstored .

a

Overwritten by the solution distributed matrix X.b

(local) INTEGER array.ipivThe dimension of ipiv is (LOCr(m_a)+mb_a). This arraycontains the pivoting information. The (local) row i of thematrix was interchanged with the (global) row ipiv(i).This array is tied to the distributed matrix A.

(global) INTEGER. If info=0, the execution is successful.infoinfo < 0:

1812

If the i-th argument is an array and the j-th entry had anillegal value, then info = -(i*100+j); if the i-thargument is a scalar and had an illegal value, then info =-i.info > 0:If info = k, U(ia+k-1,ja+k-1) is exactly zero. Thefactorization has been completed, but the factor U is exactlysingular, so the solution could not be computed.

p?gesvxUses the LU factorization to compute the solutionto the system of linear equations with a squarematrix A and multiple right-hand sides, andprovides error bounds on the solution.

Syntax

call psgesvx(fact, trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,ipiv, equed, r, c, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr,work, lwork, iwork, liwork, info)

call pdgesvx(fact, trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,ipiv, equed, r, c, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr,work, lwork, iwork, liwork, info)

call pcgesvx(fact, trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,ipiv, equed, r, c, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr,work, lwork, rwork, lrwork, info)

call pzgesvx(fact, trans, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,ipiv, equed, r, c, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr,work, lwork, rwork, lrwork, info)

Description

This routine uses the LU factorization to compute the solution to a real or complex system oflinear equations AX = B, where A denotes the n-by-n submatrix A(ia:ia+n-1, ja:ja+n-1),B denotes the n-by-nrhs submatrix B(ib:ib+n-1, jb:jb+nrhs-1) and X denotes the n-by-nrhssubmatrix X(ix:ix+n-1, jx:jx+nrhs-1).


1813


In the following description, af stands for the subarray af(iaf:iaf+n-1, jaf:jaf+n-1).

The routine p?gesvx performs the following steps:

1. If fact = 'E', real scaling factors R and C are computed to equilibrate the system:

trans = 'N': diag(R)*A*diag(C) *diag(C)-1*X = diag(R)*B

trans = 'T': (diag(R)*A*diag(C))T *diag(R)-1*X = diag(C)*B

trans = 'C': (diag(R)*A*diag(C))H *diag(R)-1*X = diag(C)*B

Whether or not the system will be equilibrated depends on the scaling of the matrix A, butif equilibration is used, A is overwritten by diag(R)*A*diag(C) and B by diag(R)*B (iftrans='N') or diag(c)*B (if trans = 'T' or 'C').

2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibrationif fact = 'E') as A = P L U, where P is a permutation matrix, L is a unit lower triangularmatrix, and U is upper triangular.

3. The factored form of A is used to estimate the condition number of the matrix A. If thereciprocal of the condition number is less than relative machine precision, steps 4 - 6 areskipped.



6. If equilibration was used, the matrix X is premultiplied by diag(C) (if trans = 'N') ordiag(R) (if trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

(global) CHARACTER*1. Must be 'F', 'N', or 'E'.factSpecifies whether or not the factored form of the matrix Ais supplied on entry, and if not, whether the matrix A shouldbe equilibrated before it is factored.If fact = 'F' then, on entry, af and ipiv contain thefactored form of A. If equed is not 'N', the matrix A hasbeen equilibrated with scaling factors given by r and c.Arrays a, af, and ipiv are not modified.If fact = 'N', the matrix A is copied to af and factored.If fact = 'E', the matrix A is equilibrated if necessary,then copied to af and factored.

1814


(global) CHARACTER*1. Must be 'N', 'T', or 'C'.transSpecifies the form of the system of equations:If trans = 'N', the system has the form A*X = B (Notranspose);If trans = 'T', the system has the form AT*X = B

(Transpose);If trans = 'C', the system has the form AH*X = B

(Conjugate transpose);


of the submatrix A (n ≥ 0).

n

(global) INTEGER. The number of right hand sides; thenumber of columns of the distributed submatrices B and X

(nrhs ≥ 0).

nrhs

(local)a, af, b, workREAL for psgesvxDOUBLE PRECISION for pdgesvxCOMPLEX for pcgesvxDOUBLE COMPLEX for pzgesvx.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)), af(lld_af,LOCc(ja+n-1)),b(lld_b,LOCc(jb+nrhs-1)), work(lwork), respectively.The array a contains the matrix A. If fact = 'F' and equedis not 'N', then A must have been equilibrated by the scalingfactors in r and/or c.The array af is an input argument if fact = 'F'. In thiscase it contains on entry the factored form of the matrix A,that is, the factors L and U from the factorization A = P*L*Uas computed by p?getrf. If equed is not 'N', then af isthe factored form of the equilibrated matrix A.The array b contains on entry the matrix B whose columnsare the right-hand sides for the systems of equations.work(*) is a workspace array. The dimension of work is(lwork).

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column of thesubmatrix A(ia:ia+n-1, ja:ja+n-1), respectively.

ia, ja

1815



desca

(global) INTEGER. The row and column indices in the globalarray af indicating the first row and the first column of thesubarray af(iaf:iaf+n-1, jaf:jaf+n-1), respectively.

iaf, jaf


descaf

(global) INTEGER. The row and column indices in the globalarray B indicating the first row and the first column of thesubmatrix B(ib:ib+n-1, jb:jb+nrhs-1), respectively.

ib, jb


descb

(local) INTEGER array.ipivThe dimension of ipiv is (LOCr(m_a)+mb_a).The array ipiv is an input argument if fact = 'F' .On entry, it contains the pivot indices from the factorizationA = P*L*U as computed by p?getrf; (local) row i of thematrix was interchanged with the (global) row ipiv(i).This array must be aligned with A(ia:ia+n-1, *).

(global) CHARACTER*1. Must be 'N', 'R', 'C', or 'B'.equed is an input argument if fact = 'F' . It specifies theform of equilibration that was done:

equed

If equed = 'N', no equilibration was done (always true iffact = 'N');If equed = 'R', row equilibration was done, that is, A hasbeen premultiplied by diag(r);If equed = 'C', column equilibration was done, that is, Ahas been postmultiplied by diag(c);If equed = 'B', both row and column equilibration wasdone; A has been replaced by diag(r)*A*diag(c).

(local) REAL for single precision flavors;r, cDOUBLE PRECISION for double precision flavors.Arrays, dimension LOCr(m_a) and LOCc(n_a), respectively.The array r contains the row scale factors for A, and thearray c contains the column scale factors for A. These arraysare input arguments if fact = 'F' only; otherwise they

1816


are output arguments. If equed = 'R' or 'B', A ismultiplied on the left by diag(r); if equed = 'N' or 'C', ris not accessed.If fact = 'F' and equed = 'R' or 'B', each element ofr must be positive.If equed = 'C' or 'B', A is multiplied on the right bydiag(c); if equed = 'N' or 'R', c is not accessed.If fact = 'F' and equed = 'C' or 'B', each element ofc must be positive. Array r is replicated in every processcolumn, and is aligned with the distributed matrix A. Arrayc is replicated in every process row, and is aligned with thedistributed matrix A.

(global) INTEGER. The row and column indices in the globalarray X indicating the first row and the first column of thesubmatrix X(ix:ix+n-1, jx:jx+nrhs-1), respectively.

ix, jx


descx

(local or global) INTEGER. The dimension of the array work; must be at least max(p?gecon(lwork),p?gerfs(lwork))+LOCr(n_a) .

lwork

(local, psgesvx/pdgesvx only) INTEGER. Workspace array.The dimension of iwork is (liwork).

iwork

(local, psgesvx/pdgesvx only) INTEGER. The dimension ofthe array iwork , must be at least LOCr(n_a) .

liwork

(local) REAL for pcgesvxrworkDOUBLE PRECISION for pzgesvx.Workspace array, used in complex flavors only.The dimension of rwork is (lrwork).

(local or global, pcgesvx/pzgesvx only) INTEGER. Thedimension of the array rwork;must be at least 2*LOCc(n_a).

lrwork

Output Parameters

(local)xREAL for psgesvxDOUBLE PRECISION for pdgesvx

1817


COMPLEX for pcgesvxDOUBLE COMPLEX for pzgesvx.Pointer into the local memory to an array of local dimensionx(lld_x,LOCc(jx+nrhs-1)).If info = 0, the array x contains the solution matrix X tothe original system of equations. Note that A and B are

modified on exit if equed ≠ 'N', and the solution to theequilibrated system is:diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B';and diag(R)-1*X, if trans = 'T' or 'C' and equed ='R' or 'B'.

Array a is not modified on exit if fact = 'F' or 'N', or iffact = 'E' and equed = 'N'.

a

If equed ≠ 'N', A is scaled on exit as follows:equed = 'R': A = diag(R)*Aequed = 'C': A = A*diag(c)equed = 'B': A = diag(R)*A*diag(c)

If fact = 'N' or 'E', then af is an output argument andon exit returns the factors L and U from the factorization A= P*L*U of the original matrix A (if fact = 'N') or of theequilibrated matrix A (if fact = 'E'). See the descriptionof a for the form of the equilibrated matrix.

af

Overwritten by diag(R)*B if trans = 'N' and equed ='R' or 'B';

b

overwritten by diag(c)*B if trans = 'T' and equed ='C' or 'B'; not changed if equed = 'N'.

These arrays are output arguments if fact ≠ 'F'.r, c

See the description of r, c in Input Arguments section.

(global) REAL for single precision flavors.rcondDOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrixA after equilibration (if done). The routine sets rcond =0 ifthe estimate underflows; in this case the matrix is singular(to working precision). However, anytime rcond is smallcompared to 1.0, for the working precision, the matrix maybe poorly conditioned or even singular.

1818


(local) REAL for single precision flavorsferr, berrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION LOCc(n_b) each. Contain thecomponent-wise forward and relative backward errors,respectively, for each solution vector.Arrays ferr and berr are both replicated in every processrow, and are aligned with the matrices B and X.

If fact = 'N' or 'E', then ipiv is an output argumentand on exit contains the pivot indices from the factorizationA = P*L*U of the original matrix A (if fact = 'N') or ofthe equilibrated matrix A (if fact = 'E').

ipiv

If fact ≠ 'F' , then equed is an output argument. Itspecifies the form of equilibration that was done (see thedescription of equed in Input Arguments section).

equed

If info=0, on exit work(1) returns the minimum value oflwork required for optimum performance.

work(1)

If info=0, on exit iwork(1) returns the minimum value ofliwork required for optimum performance.

iwork(1)

If info=0, on exit rwork(1) returns the minimum value oflrwork required for optimum performance.

rwork(1)

INTEGER. If info=0, the execution is successful.infoinfo < 0: if the ith argument is an array and the jth entryhad an illegal value, then info = -(i*100+j); if the ithargument is a scalar and had an illegal value, then info =

-i. If info = i, and i ≤ n, then U(i,i) is exactly zero.The factorization has been completed, but the factor U isexactly singular, so the solution and error bounds could notbe computed. If info = i, and i = n +1, then U isnonsingular, but rcond is less than machine precision. Thefactorization has been completed, but the matrix is singularto working precision and the solution and error bounds havenot been computed.

1819

p?gbsvComputes the solution to the system of linearequations with a general banded distributed matrixand multiple right-hand sides.

Syntax

call psgbsv(n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, work, lwork,info)

call pdgbsv(n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, work, lwork,info)

call pcgbsv(n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, work, lwork,info)

call pzgbsv(n, bwl, bwu, nrhs, a, ja, desca, ipiv, b, ib, descb, work, lwork,info)

Description

The routine p?gbsv computes the solution to a real or complex system of linear equations

sub(A)*X = sub(B),

where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real/complex general banded distributedmatrix with bwl subdiagonals and bwu superdiagonals, and X and sub(B)= B(ib:ib+n-1,1:rhs) are n-by-nrhs distributed matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) assub(A) = P*L*U*Q, where P and Q are permutation matrices, and L and U are banded lowerand upper triangular matrices, respectively. The matrix Q represents reordering of columns forthe sake of parallelism, while P represents reordering of rows for numerical stability using classicpartial pivoting.

Input Parameters


sub(A) (n ≥ 0).

n

(global) INTEGER. The number of subdiagonals within the

band of A (0≤ bwl ≤ n-1 ).

bwl

1820


(global) INTEGER. The number of superdiagonals within the

band of A (0≤ bwu ≤ n-1 ).

bwu


(nrhs ≥ 0).

nrhs

(local)a, bREAL for psgbsvDOUBLE PRECISON for pdgbsvCOMPLEX for pcgbsvDOUBLE COMPLEX for pzgbsv.Pointers into the local memory to arrays of local dimensiona(lld_a,LOCc(ja+n-1)) and b(lld_b,LOCc(nrhs)),respectively.On entry, the array a contains the local pieces of the globalarray A.On entry, the array b contains the right hand side distributedmatrix sub(B).


ja


desca




ib


descb



(local)workREAL for psgbsvDOUBLE PRECISON for pdgbsvCOMPLEX for pcgbsv

1821


DOUBLE COMPLEX for pzgbsv.Workspace array of dimension (lwork).


be at least lwork ≥(NB+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu) +

lwork

+ max(nrhs *(NB+2*bwl+4*bwu), 1).

Output Parameters

On exit, contains details of the factorization. Note that theresulting factorization is not the same factorization asreturned from LAPACK. Additional permutations areperformed on the matrix for the sake of parallelism.

a


b

(local) INTEGER array.ipivThe dimension of ipiv must be at least desca(NB). Thisarray contains pivot indices for local factorizations. Youshould not alter the contents between factorization andsolve.


work(1)


info

If the ith argument is an array and the jth entry had anillegal value, then info = -(i*100+j); if the ith argumentis a scalar and had an illegal value, then info = -i.info > 0:

If info = k ≤ NPROCS, the submatrix stored on processorinfo and factored locally was not nonsingular, and thefactorization was not completed. If info = k > NPROCS,the submatrix stored on processor info-NPROCSrepresenting interactions with other processors was notnonsingular, and the factorization was not completed.

1822


p?dbsvSolves a general band system of linear equations.

Syntax

call psdbsv(n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pddbsv(n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pcdbsv(n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pzdbsv(n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

Description

This routine solves the system of linear equations

A(1:n, ja:ja+n-1)* X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded diagonally dominant-likedistributed matrix with bandwidth bwl, bwu.

Gaussian elimination without pivoting is used to factor a reordering of the matrix into LU.

Input Parameters

(global) INTEGER. The order of the distributed submatrix A,

(n ≥ 0).

n

(global) INTEGER. Number of subdiagonals. 0 ≤ bwl ≤n-1.

bwl

(global) INTEGER. Number of subdiagonals. 0 ≤ bwu ≤n-1.

bwu

(global) INTEGER. The number of right-hand sides; thenumber of columns of the distributed submatrix B, (nrhs

≥ 0).

nrhs

(local). REAL for psdbsvaDOUBLE PRECISION for pddbsvCOMPLEX for pcdbsvDOUBLE COMPLEX for pzdbsv.

1823


Pointer into the local memory to an array with first

dimension lld_a ≥ (bwl+bwu+1) (stored in desca). Onentry, this array contains the local pieces of the distributedmatrix.

(global) INTEGER. The index in the global array a that pointsto the start of the matrix to be operated on (which may beeither all of A or a submatrix of A).

ja

(global and local) INTEGER array of dimension dlen.desca

If 1d type (dtype_a=501 or 502), dlen ≥ 7;

If 2d type (dtype_a=1), dlen ≥ 9.The array descriptor for the distributed matrix A.Contains information of mapping of A to memory.

(local)bREAL for psdbsvDOUBLE PRECISON for pddbsvCOMPLEX for pcdbsvDOUBLE COMPLEX for pzdbsv.Pointer into the local memory to an array of local lead

dimension lld_b ≥ nb. On entry, this array contains thelocal pieces of the right hand sides B(ib:ib+n-1, 1:nrhs).

(global) INTEGER. The row index in the global array b thatpoints to the first row of the matrix to be operated on (whichmay be either all of b or a submatrix of B).

ib

(global and local) INTEGER array of dimension dlen.descb

If 1d type (dtype_b =502), dlen ≥ 7;

If 2d type (dtype_b =1), dlen ≥ 9.The array descriptor for the distributed matrix B.Contains information of mapping of B to memory.

(local).workREAL for psdbsvDOUBLE PRECISON for pddbsvCOMPLEX for pcdbsvDOUBLE COMPLEX for pzdbsv.

1824


Temporary workspace. This space may be overwritten inbetween calls to routines. work must be the size given inlwork.

(local or global) INTEGER. Size of user-input workspacework. If lwork is too small, the minimal acceptable size willbe returned in work(1) and an error code is returned.

lwork

lwork ≥nb(bwl+bwu)+6max(bwl,bwu)*max(bwl,bwu)+max((max(bwl,bwu)nrhs),max(bwl,bwu)*max(bwl,bwu))

Output Parameters

On exit, this array contains information containing detailsof the factorization.

a

Note that permutations are performed on the matrix, sothat the factors returned are different from those returnedby LAPACK.

On exit, this contains the local piece of the solutionsdistributed matrix X.

b

On exit, work(1) contains the minimal lwork.work

(local) INTEGER. If info=0, the execution is successful.info< 0: If the i-th argument is an array and the j-entry hadan illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.> 0: If info = k < NPROCS, the submatrix stored onprocessor info and factored locally was not positive definite,and the factorization was not completed.If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not positive definite, and the factorizationwas not completed.

1825

p?dtsvSolves a general tridiagonal system of linearequations.

Syntax

call psdtsv(n, nrhs, dl, d, du, ja, desca, b, ib, descb, work, lwork, info)

call pddtsv(n, nrhs, dl, d, du, ja, desca, b, ib, descb, work, lwork, info)

call pcdtsv(n, nrhs, dl, d, du, ja, desca, b, ib, descb, work, lwork, info)

call pzdtsv(n, nrhs, dl, d, du, ja, desca, b, ib, descb, work, lwork, info)

Description


A(1:n, ja:ja+n-1) * X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n complex tridiagonal diagonally dominant-like distributedmatrix.

Gaussian elimination without pivoting is used to factor a reordering of the matrix into L U.

Input Parameters

(global) INTEGER. The order of the distributed submatrix A

(n ≥ 0).

n

INTEGER. The number of right hand sides; the number of

columns of the distributed matrix B (nrhs ≥ 0).

nrhs

(local). REAL for psdtsvdlDOUBLE PRECISION for pddtsvCOMPLEX for pcdtsvDOUBLE COMPLEX for pzdtsv.Pointer to local part of global vector storing the lowerdiagonal of the matrix. Globally, dl(1)is not referenced, anddl must be aligned with d. Must be of size > desca( nb_ ).

(local). REAL for psdtsvdDOUBLE PRECISION for pddtsvCOMPLEX for pcdtsv

1826


DOUBLE COMPLEX for pzdtsv.Pointer to local part of global vector storing the maindiagonal of the matrix.

(local). REAL for psdtsvduDOUBLE PRECISION for pddtsvCOMPLEX for pcdtsvDOUBLE COMPLEX for pzdtsv.Pointer to local part of global vector storing the upperdiagonal of the matrix. Globally, du(n) is not referenced,and du must be aligned with d.


ja




(local)bREAL for psdtsvDOUBLE PRECISONfor pddtsvCOMPLEX for pcdtsvDOUBLE COMPLEX for pzdtsv.Pointer into the local memory to an array of local leaddimension lld_b > nb. On entry, this array contains thelocal pieces of the right hand sides B(ib:ib+n-1, 1:nrhs).


ib


If 1d type (dtype_b =502), dlen ≥ 7;

If 2d type (dtype_b =1), dlen ≥ 9.The array descriptor for the distributed matrix B.Contains information of mapping of B to memory.

(local).work

1827


REAL for psdtsvDOUBLE PRECISON for pddtsvCOMPLEX for pcdtsvDOUBLE COMPLEX for pzdtsv. Temporary workspace. Thisspace may be overwritten in between calls to routines. workmust be the size given in lwork.

(local or global) INTEGER. Size of user-input workspacework. If lwork is too small, the minimal acceptable size willbe returned in work(1) and an error code is returned. lwork> (12*NPCOL+3*nb)+max((10+2*min(100,nrhs))*NPCOL+4*nrhs, 8*NPCOL)

lwork

Output Parameters

On exit, this array contains information containing the *factors of the matrix.

dl

On exit, this array contains information containing the *factors of the matrix. Must be of size > desca( nb_ ).

d

On exit, this array contains information containing the *factors of the matrix. Must be of size > desca( nb_ ).

du


b


(local) INTEGER. If info=0, the execution is successful.info< 0: If the i-th argument is an array and the j-entry hadan illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.> 0: If info = k < NPROCS, the submatrix stored onprocessor info and factored locally was not positive definite,and the factorization was not completed.If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not positive definite, and the factorizationwas not completed.

1828

p?posvSolves a symmetric positive definite system oflinear equations.

Syntax

call psposv(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pdposv(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pcposv(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

call pzposv(uplo, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, info)

Description

This routine computes the solution to a real/complex system of linear equations

sub(A)*X = sub(B),

where sub(A) denotes A(ia:ia+n-1,ja:ja+n-1) and is an n-by-n symmetric/Hermitiandistributed positive definite matrix and X and sub(B) denoting B(ib:ib+n-1,jb:jb+nrhs-1)are n-by-nrhs distributed matrices. The Cholesky decomposition is used to factor sub(A) as

sub(A) = UT*U, if uplo = 'U', or

sub(A) = L*LT, if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix. The factored form ofsub(A) is then used to solve the system of equations.

Input Parameters

(global). CHARACTER. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part ofsub(A) is stored.


sub(A) (n ≥ 0).

n


columns of the distributed submatrix sub(B) (nrhs ≥ 0).

nrhs

(local)aREAL for psposv

1829


DOUBLE PRECISION for pdposvCOMPLEX for pcposvCOMPLEX*16 for pzposv.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the n-by-n symmetric distributed matrixsub(A) to be factored.If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the distributedmatrix, and its strictly upper triangular part is notreferenced.


ia, ja


desca

(local)bREAL for psposvDOUBLE PRECISON for pdposvCOMPLEX for pcposvCOMPLEX*16 for pzposv.Pointer into the local memory to an array of dimension(lld_b,LOC(jb+nrhs-1)). On entry, the local pieces ofthe right hand sides distributed matrix sub(B).


ib, jb


descb

1830


Output Parameters

On exit, if info = 0, this array contains the local pieces ofthe factor U or L from the Cholesky factorization sub(A) =UH*U, or L*LH.

a

On exit, if info = 0, sub(B) is overwritten by the solutiondistributed matrix X.

b

(global) INTEGER.infoIf info =0, the execution is successful.If info < 0: If the i-th argument is an array and thej-entry had an illegal value, then info = -(i*100+j), ifthe i-th argument is a scalar and had an illegal value, theninfo = -i.If info > 0: If info = k, the leading minor of order k,A(ia:ia+k-1, ja:ja+k-1) is not positive definite, andthe factorization could not be completed, and the solutionhas not been computed.

p?posvxSolves a symmetric or Hermitian positive definitesystem of linear equations.

Syntax

call psposvx(fact, uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,equed, sr, sc, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr, work,lwork, iwork, liwork, info)

call pdposvx(fact, uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,equed, sr, sc, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr, work,lwork, iwork, liwork, info)

call pcposvx(fact, uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,equed, sr, sc, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr, work,lwork, iwork, liwork, info)

call pzposvx(fact, uplo, n, nrhs, a, ia, ja, desca, af, iaf, jaf, descaf,equed, sr, sc, b, ib, jb, descb, x, ix, jx, descx, rcond, ferr, berr, work,lwork, iwork, liwork, info)

1831

Description

This routine uses the Cholesky factorization A=UT*U or A=L*LT to compute the solution to areal or complex system of linear equations

A(ia:ia+n-1, ja:ja+n-1)*X = B(ib:ib+n-1, jb:jb+nrhs-1),

where A(ia:ia+n-1, ja:ja+n-1) is a n-by-n matrix and X and B(ib:ib+n-1,jb:jb+nrhs-1)are n-by-nrhs matrices.


In the following comments y denotes Y(iy:iy+m-1, jy:jy+k-1) a m-by-k matrix where ycan be a, af, b and x.

The routine p?posvx performs the following steps:


diag(sr)*A*diag(sc)*inv(diag(sc))*X = diag(sr)*B

Whether or not the system will be equilibrated depends on the scaling of the matrix A, butif equilibration is used, A is overwritten by diag(sr)*A*diag(sc) and B by diag(sr)*B .


A = UT*U, if uplo = 'U', or

A = L*LT, if uplo = 'L',


3. The factored form of A is used to estimate the condition number of the matrix A. If thereciprocal of the condition number is less than machine precision, steps 4-6 are skipped



6. If equilibration was used, the matrix X is premultiplied by diag(sr) so that it solves theoriginal system before equilibration.

Input Parameters

(global) CHARACTER. Must be 'F', 'N', or 'E'.fact

1832


Specifies whether or not the factored form of the matrix Ais supplied on entry, and if not, whether the matrix A shouldbe equilibrated before it is factored.If fact = 'F': on entry, af contains the factored form ofA. If equed = 'Y', the matrix A has been equilibrated withscaling factors given by s. a and af will not be modified.If fact = 'N', the matrix A will be copied to af andfactored.If fact = 'E', the matrix A will be equilibrated if necessary,then copied to af and factored.

(global) CHARACTER. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular part of A isstored.


sub(A) (n ≥ 0).

n

(global) INTEGER. The number of right-hand sides; thenumber of columns of the distributed submatrices B and X.

(nrhs ≥ 0).

nrhs

(local)aREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Pointer into the local memory to an array of local dimension(lld_a, LOCc(ja+n-1)). On entry, thesymmetric/Hermitian matrix A, except if fact = 'F' andequed = 'Y', then A must contain the equilibrated matrixdiag(sr)*A*diag(sc).If uplo = 'U', the leading n-by-n upper triangular part ofA contains the upper triangular part of the matrix A, andthe strictly lower triangular part of A is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofA contains the lower triangular part of the matrix A, and thestrictly upper triangular part of A is not referenced. A is notmodified if fact = 'F' or 'N', or if fact = 'E' and equed= 'N' on exit.

1833



ia, ja


desca

(local)afREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Pointer into the local memory to an array of local dimension(lld_af, LOCc(ja+n-1)).If fact = 'F', then af is an input argument and on entrycontains the triangular factor U or L from the Choleskyfactorization A = UT*U or A = L*LT, in the same storage

format as A. If equed ≠ 'N', then af is the factored formof the equilibrated matrix diag(sr)*A*diag(sc).

(global) INTEGER. The row and column indices in the globalarray af indicating the first row and the first column of thesubmatrix AF, respectively.

iaf, jaf


descaf

(global). CHARACTER. Must be 'N' or 'Y'.equedequed is an input argument if fact = 'F'. It specifies theform of equilibration that was done:If equed = 'N', no equilibration was done (always true iffact = 'N');If equed = 'Y', equilibration was done and A has beenreplaced by diag(sr)*A*diag(sc).

(local)srREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Array, DIMENSION (lld_a).

1834


The array s contains the scale factors for A. This array is aninput argument if fact = 'F' only; otherwise it is an outputargument.If equed = 'N', s is not accessed.If fact = 'F' and equed = 'Y', each element of s mustbe positive.

(local)bREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Pointer into the local memory to an array of local dimension(lld_b, LOCc(jb+nrhs-1)). On entry, the n-by-nrhsright-hand side matrix B.


ib, jb

(global and local) INTEGER. Array, dimension (dlen_). Thearray descriptor for the distributed matrix B.

descb

(local)xREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Pointer into the local memory to an array of local dimension(lld_x, LOCc(jx+nrhs-1)).

(global) INTEGER. The row and column indices in the globalarray x indicating the first row and the first column of thesubmatrix X, respectively.

ix, jx


descx

(local)workREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.Workspace array, DIMENSION (lwork).

1835


(local or global) INTEGER.lworkThe dimension of the array work. lwork is local input andmust be at least lwork = max(p?pocon(lwork),p?porfs(lwork)) + LOCr(n_a).lwork = 3*desca(lld_).If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

(local) INTEGER. Workspace array, dimension (liwork).iwork

(local or global)liworkINTEGER. The dimension of the array iwork. liwork is localinput and must be at least liwork = desca(lld_) liwork= LOCr(n_a).If liwork = -1, then liwork is global input and aworkspace query is assumed; the routine only calculatesthe minimum and optimal size for all work arrays. Each ofthese values is returned in the first entry of thecorresponding work array, and no error message is issuedby pxerbla.

Output Parameters

On exit, if fact = 'E' and equed = 'Y', a is overwrittenby diag(sr)*a*diag(sc).

a

If fact = 'N', then af is an output argument and on exitreturns the triangular factor U or L from the Choleskyfactorization A = UT*U or A = L*LT of the original matrixA.

af

If fact = 'E', then af is an output argument and on exitreturns the triangular factor U or L from the Choleskyfactorization A = UT*U or A = L*LT of the equilibratedmatrix A (see the description of A for the form of theequilibrated matrix).

If fact ≠ 'F' , then equed is an output argument. Itspecifies the form of equilibration that was done (see thedescription of equed in Input Arguments section).

equed

1836


This array is an output argument if fact ≠ 'F'.sr

See the description of sr in Input Arguments section.

This array is an output argument if fact ≠ 'F'.sc

See the description of sc in Input Arguments section.

On exit, if equed = 'N', b is not modified; if trans = 'N'and equed = 'R' or 'B', b is overwritten by diag(r)*b;if trans = 'T' or 'C' and equed = 'C' or 'B', b isoverwritten by diag(c)*b.

b

(local)xREAL for psposvxDOUBLE PRECISION for pdposvxCOMPLEX for pcposvxDOUBLE COMPLEX for pzposvx.If info = 0 the n-by-nrhs solution matrix X to the originalsystem of equations.

Note that A and B are modified on exit if equed ≠ 'N', andthe solution to the equilibrated system isinv(diag(sc))*X if trans = 'N' and equed = 'C' or'B', orinv(diag(sr))*X if trans = 'T' or 'C' and equed ='R' or 'B'.

(global)rcondREAL for single precision flavors.DOUBLE PRECISION for double precision flavors.An estimate of the reciprocal condition number of the matrixA after equilibration (if done). If rcond is less than themachine precision (in particular, if rcond=0), the matrix issingular to working precision. This condition is indicated bya return code of info > 0.

REAL for single precision flavors.ferrDOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(LOC,n_b). The estimatedforward error bounds for each solution vector X(j) (the j-thcolumn of the solution matrix X). If xtrue is the truesolution, ferr(j) bounds the magnitude of the largest entryin (X(j) - xtrue) divided by the magnitude of the largest

1837


entry in X(j). The quality of the error bound depends onthe quality of the estimate of norm(inv(A)) computed inthe code; if the estimate of norm(inv(A)) is accurate, theerror bound is guaranteed.

(local)berrREAL for single precision flavors.DOUBLE PRECISION for double precision flavors.Arrays, DIMENSION at least max(LOC,n_b). Thecomponentwise relative backward error of each solutionvector X(j) (the smallest relative change in any entry of Aor B that makes X(j) an exact solution).

(local) On exit, work(1) returns the minimal and optimalliwork.

work(1)

(global) INTEGER.infoIf info=0, the execution is successful.< 0: if info = -i, the i-th argument had an illegal value

> 0: if info = i, and i is ≤ n: if info = i, the leadingminor of order i of a is not positive definite, so thefactorization could not be completed, and the solution anderror bounds could not be computed.= n+1: rcond is less than machine precision. Thefactorization has been completed, but the matrix is singularto working precision, and the solution and error boundshave not been computed.

p?pbsvSolves a symmetric/Hermitian positive definitebanded system of linear equations.

Syntax

call pspbsv(uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pdpbsv(uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pcpbsv(uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

call pzpbsv(uplo, n, bw, nrhs, a, ja, desca, b, ib, descb, work, lwork, info)

1838

Description


A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded symmetric positive definitedistributed matrix with bandwidth bw.

Cholesky factorization is used to factor a reordering of the matrix into L*L'.

Input Parameters

(global) CHARACTER. Must be 'U' or 'L'.uploIndicates whether the upper or lower triangular of A isstored.If uplo = 'U', the upper triangular A is storedIf uplo = 'L', the lower triangular of A is stored.

(global) INTEGER. The order of the distributed matrix A (n

≥ 0).

n

(global) INTEGER. The number of subdiagonals in L or U. 0

≤ bw ≤ n-1.

bw

(global) INTEGER. The number of right-hand sides; the

number of columns in B (nrhs ≥ 0).

nrhs

(local). REAL for pspbsvaDOUBLE PRECISON for pdpbsvCOMPLEX for pcpbsvDOUBLE COMPLEX for pzpbsv.Pointer into the local memory to an array with first

dimension lld_a ≥ (bw+1) (stored in desca). On entry,this array contains the local pieces of the distributed matrixsub(A) to be factored.


ja


desca

(local)b

1839


REAL for pspbsvDOUBLE PRECISON for pdpbsvCOMPLEX for pcpbsvDOUBLE COMPLEX for pzpbsv.Pointer into the local memory to an array of local lead

dimension lld_b ≥ nb. On entry, this array contains thelocal pieces of the right hand sides B(ib:ib+n-1, 1:nrhs).


ib


If 1D type (dtype_b =502), dlen ≥ 7;

If 2D type (dtype_b =1), dlen ≥ 9.The array descriptor for the distributed matrix B.Contains information of mapping of B to memory.

(local).workREAL for pspbsvDOUBLE PRECISON for pdpbsvCOMPLEX for pcpbsvDOUBLE COMPLEX for pzpbsv.Temporary workspace. This space may be overwritten inbetween calls to routines. work must be the size given inlwork.

(local or global) INTEGER. Size of user-input workspacework. If lwork is too small, the minimal acceptable size willbe returned in work(1)and an error code is returned. lwork

≥ (nb+2*bw)*bw +max((bw*nrhs), bw*bw)

lwork

Output Parameters

On exit, this array contains information containing detailsof the factorization. Note that permutations are performedon the matrix, so that the factors returned are differentfrom those returned by LAPACK.

a

On exit, contains the local piece of the solutions distributedmatrix X.

b

1840


(global). INTEGER. If info=0, the execution is successful.info< 0: If the i-th argument is an array and the j-entry hadan illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

> 0: If info = k ≤ NPROCS, the submatrix stored onprocessor info and factored locally was not positive definite,and the factorization was not completed.If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not positive definite, and the factorizationwas not completed.

p?ptsvSolves a symmetric or Hermitian positive definitetridiagonal system of linear equations.

Syntax

call psptsv(n, nrhs, d, e, ja, desca, b, ib, descb, work, lwork, info)

call pdptsv(n, nrhs, d, e, ja, desca, b, ib, descb, work, lwork, info)

call pcptsv(n, nrhs, d, e, ja, desca, b, ib, descb, work, lwork, info)

call pzptsv(n, nrhs, d, e, ja, desca, b, ib, descb, work, lwork, info)

Description


A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real tridiagonal symmetric positive definite distributedmatrix.

Cholesky factorization is used to factor a reordering of the matrix into L*L'.

Input Parameters

(global) INTEGER. The order of matrix A (n ≥ 0).n

1841


number of columns of the distributed submatrix B (nrhs ≥0).

nrhs

(local)dREAL for psptsvDOUBLE PRECISON for pdptsvCOMPLEX for pcptsvDOUBLE COMPLEX for pzptsv.Pointer to local part of global vector storing the maindiagonal of the matrix.

(local)eREAL for psptsvDOUBLE PRECISON for pdptsvCOMPLEX for pcptsvDOUBLE COMPLEX for pzptsv.Pointer to local part of global vector storing the upperdiagonal of the matrix. Globally, du(n) is not referenced,and du must be aligned with d.


ja




(local)bREAL for psptsvDOUBLE PRECISON for pdptsvCOMPLEX for pcptsvDOUBLE COMPLEX for pzptsv.Pointer into the local memory to an array of local lead

dimension lld_b ≥ nb.On entry, this array contains the local pieces of the righthand sides B(ib:ib+n-1, 1:nrhs).

1842



ib


If 1d type (dtype_b = 502), dlen ≥ 7;

If 2d type (dtype_b = 1), dlen ≥ 9.The array descriptor for the distributed matrix B.Contains information of mapping of B to memory.

(local).workREAL for psptsvDOUBLE PRECISON for pdptsvCOMPLEX for pcptsvDOUBLE COMPLEX for pzptsv.Temporary workspace. This space may be overwritten inbetween calls to routines. work must be the size given inlwork.

(local or global) INTEGER. Size of user-input workspacework. If lwork is too small, the minimal acceptable size willbe returned in work(1) and an error code is returned. lwork> (12*NPCOL+3*nb)+max((10+2*min(100,nrhs))*NPCOL+4*nrhs, 8*NPCOL).

lwork

Output Parameters

On exit, this array contains information containing thefactors of the matrix. Must be of size greater than or equalto desca(nb_).

d

On exit, this array contains information containing thefactors of the matrix. Must be of size greater than or equalto desca(nb_).

e


b


(local) INTEGER. If info=0, the execution is successful.info

1843


< 0: If the i-th argument is an array and the j-entry hadan illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

> 0: If info = k ≤ NPROCS, the submatrix stored onprocessor info and factored locally was not positive definite,and the factorization was not completed.If info = k > NPROCS, the submatrix stored on processorinfo-NPROCS representing interactions with otherprocessors was not positive definite, and the factorizationwas not completed.

p?gelsSolves overdetermined or underdetermined linearsystems involving a matrix of full rank.

Syntax

call psgels(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork,info)

call pdgels(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork,info)

call pcgels(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork,info)

call pzgels(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork,info)

Description

This routine solves overdetermined or underdetermined real/ complex linear systems involvingan m-by-n matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1), or its transpose/ conjugate-transpose,using a QTQ or LQ factorization of sub(A). It is assumed that sub(A) has full rank.

The following options are provided:

1. If trans = 'N' and m ≥ n: find the least squares solution of an overdetermined system,that is, solve the least squares problem

minimize ||sub(B) - sub(A)*X||

1844

2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined systemsub(A)*X = sub(B).

3. If trans = 'T' and m ≥ n: find the minimum norm solution of an undetermined systemsub(A)T*X = sub(B).

4. If trans = 'T' and m < n: find the least squares solution of an overdetermined system,that is, solve the least squares problem

minimize ||sub(B) - sub(A)T*X||,

where sub(B) denotes B(ib:ib+m-1, jb:jb+nrhs-1) when trans = 'N' andB(ib:ib+n-1, jb:jb+nrhs-1) otherwise. Several right hand side vectors b and solutionvectors x can be handled in a single call; when trans = 'N', the solution vectors are storedas the columns of the n-by-nrhs right hand side matrix sub(B) and the m-by-nrhs right handside matrix sub(B) otherwise.

Input Parameters

(global) CHARACTER. Must be 'N', or 'T'.transIf trans = 'N', the linear system involves matrix sub(A);If trans = 'T', the linear system involves the transposedmatrix AT (for real flavors only).


submatrix sub (A) (m ≥ 0).

m


submatrix sub (A) (n ≥ 0).

n

(global) INTEGER. The number of right-hand sides; thenumber of columns in the distributed submatrices sub(B)

and X. (nrhs ≥ 0).

nrhs

(local)aREAL for psgelsDOUBLE PRECISION for pdgelsCOMPLEX for pcgelsDOUBLE COMPLEX for pzgels.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, contains the m-by-nmatrix A.

1845


ia, ja


desca

(local)bREAL for psgelsDOUBLE PRECISION for pdgelsCOMPLEX for pcgelsDOUBLE COMPLEX for pzgels.Pointer into the local memory to an array of local dimension(lld_b, LOCc(jb+nrhs-1)). On entry, this array containsthe local pieces of the distributed matrix B of right-handside vectors, stored columnwise; sub(B) is m-by-nrhs iftrans='N', and n-by-nrhs otherwise.


ib, jb


descb

(local)workREAL for psgelsDOUBLE PRECISION for pdgelsCOMPLEX for pcgelsDOUBLE COMPLEX for pzgels.Workspace array with dimension lwork.

(local or global) INTEGER.lworkThe dimension of the array work lwork is local input and

must be at least lwork ≥ ltau + max(lwf, lws), whereif m > n, thenltau = numroc(ja+min(m,n)-1, nb_a, MYCOL,csrc_a, NPCOL),lwf = nb_a*(mpa0 + nqa0 + nb_a)lws = max((nb_a*(nb_a-1))/2, (nrhsqb0 +mpb0)*nb_a) + nb_a*nb_aelse

1846


ltau = numroc(ia+min(m,n)-1, mb_a, MYROW,rsrc_a, NPROW),lwf = mb_a * (mpa0 + nqa0 + mb_a)lws = max((mb_a*(mb_a-1))/2, (npb0 + max(nqa0 +numroc(numroc(n+iroffb, mb_a, 0, 0, NPROW),mb_a, 0, 0, lcmp), nrhsqb0))*mb_a) + mb_a*mb_aend if,where lcmp = lcm/NPROW with lcm = ilcm(NPROW,NPCOL),iroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),iacol= indxg2p(ja, nb_a, MYROW, rsrc_a, NPROW)mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow,NPROW),nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol,NPCOL),iroffb = mod(ib-1, mb_b),icoffb = mod(jb-1, nb_b),ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),mpb0 = numroc(m+iroffb, mb_b, MYROW, icrow,NPROW),nqb0 = numroc(n+icoffb, nb_b, MYCOL, ibcol,NPCOL),ilcm, indxg2p and numroc are ScaLAPACK tool functions;MYROW, MYCOL, NPROW, and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

1847


Output Parameters

On exit, If m ≥ n, sub(A) is overwritten by the details of itsQR factorization as returned by p?geqrf; if m < n, sub(A)is overwritten by details of its LQ factorization as returnedby p?gelqf.

a

On exit, sub(B) is overwritten by the solution vectors, stored

columnwise: if trans = 'N' and m ≥ n, rows 1 to n ofsub(B) contain the least squares solution vectors; the

b

residual sum of squares for the solution in each column isgiven by the sum of squares of elements n+1 to m in thatcolumn;If trans = 'N' and m < n, rows 1 to n of sub(B) containthe minimum norm solution vectors;

If trans = 'T' and m ≥ n, rows 1 to m of sub(B) containthe minimum norm solution vectors; if trans = 'T' andm < n, rows 1 to m of sub(B) contain the least squaressolution vectors; the residual sum of squares for the solutionin each column is given by the sum of squares of elementsm+1 to n in that column.


work(1)


1848

p?syevComputes selected eigenvalues and eigenvectorsof a symmetric matrix.

Syntax

call pssyev( jobz, uplo, n, a, ia, ja, desca, w, z, iz, jz, descz, work, lwork,info )

call pdsyev( jobz, uplo, n, a, ia, ja, desca, w, z, iz, jz, descz, work, lwork,info )

Description

This routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrixA by calling the recommended sequence of ScaLAPACK routines.

In its present form, the routine assumes a homogeneous system and makes no checks forconsistency of the eigenvalues or eigenvectors across the different processes. Because of this,it is possible that a heterogeneous system may return incorrect results without any errormessages.

Input Parameters

np = the number of rows local to a given process.

nq = the number of columns local to a given process.

(global). CHARACTER. Must be 'N' or 'V'. Specifies if it isnecessary to compute the eigenvectors:

jobz

If jobz ='N', then only eigenvalues are computed.If jobz ='V', then eigenvalues and eigenvectors arecomputed.

(global). CHARACTER. Must be 'U' or 'L'. Specifies whetherthe upper or lower triangular part of the symmetric matrixA is stored:

uplo

If uplo = 'U', a stores the upper triangular part of A.If uplo = 'L', a stores the lower triangular part of A.

(global) INTEGER. The number of rows and columns of the

matrix A (n ≥ 0).

n

(local)a

1849


REAL for pssyev.DOUBLE PRECISION for pdsyev.Block cyclic array of global dimension (n, n) and localdimension (lld_a, LOC c(ja+n-1)). On entry, thesymmetric matrix A.If uplo = 'U', only the upper triangular part of A is usedto define the elements of the symmetric matrix.If uplo = 'L', only the lower triangular part of A is usedto define the elements of the symmetric matrix.


ia, ja


desca


iz, jz


descz

(local)workREAL for pssyev.DOUBLE PRECISION for pdsyev.Array, DIMENSION (lwork).

(local) INTEGER. See below for definitions of variables usedto define lwork.

lwork

If no eigenvectors are requested (jobz = 'N'), then lwork

≥ 5*n + sizesytrd + 1,where sizesytrd is the workspace for p?sytrd and ismax(NB*(np +1), 3*NB).If eigenvectors are requested (jobz = 'V') then theamount of workspace required to guarantee that alleigenvectors are computed is:qrmem = 2*n-2lwmin = 5*n + n*ldc + max(sizemqrleft, qrmem) +1Variable definitions:

1850


nb = desca(mb_) = desca(nb_) = descz(mb_) =descz(nb_);nn = max(n, nb, 2);desca(rsrc_) = desca(rsrc_) = descz(rsrc_) =descz(csrc_) = 0np = numroc(nn, nb, 0, 0, NPROW)nq = numroc(max(n, nb, 2), nb, 0, 0, NPCOL)nrc = numroc(n, nb, myprowc, 0, NPROCS)ldc = max(1, nrc)sizemqrleft is the workspace for p?ormtr when its sideargument is 'L'.myprowc is defined when a new context is created as follows:call blacs_get(desca(ctxt_), 0, contextc)call blacs_gridinit(contextc, 'R', NPROCS, 1)call blacs_gridinfo(contextc, nprowc, npcolc,myprowc, mypcolc)If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, the lower triangle (if uplo='L') or the uppertriangle (if uplo='U') of A, including the diagonal, isdestroyed.

a

(global). REAL for pssyevwDOUBLE PRECISION for pdsyevArray, DIMENSION (n).On normal exit, the first m entries contain the selectedeigenvalues in ascending order.

(local). REAL for pssyevzDOUBLE PRECISION for pdsyevArray, global dimension (n, n), local dimension (lld_z,LOCc(jz+n-1)). If jobz = 'V', then on normal exit thefirst m columns of z contain the orthonormal eigenvectorsof the matrix corresponding to the selected eigenvalues.

1851


If jobz = 'N', then z is not referenced.

On output, work(1) returns the workspace needed toguarantee completion. If the input parameters are incorrect,work(1) may also be incorrect.

work(1)

If jobz = 'N' work(1) = minimal (optimal) amount ofworkspaceIf jobz = 'V' work(1) = minimal workspace required togenerate all the eigenvectors.

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0: If the i-th argument is an array and thej-entry had an illegal value, then info = -(i*100+j), ifthe i-th argument is a scalar and had an illegal value, theninfo = -i.If info > 0:If info= 1 through n, the i-th eigenvalue did not convergein ?steqr2 after a total of 30n iterations.If info= n+1, then p?syev has detected heterogeneity byfinding that eigenvalues were not identical across theprocess grid. In this case, the accuracy of the results fromp?syev cannot be guaranteed.

p?syevxComputes selected eigenvalues and, optionally,eigenvectors of a symmetric matrix.

Syntax

call pssyevx(jobz, range, uplo, n, a, ia, ja, desca, vl, vu, il, iu, abstol,m, nz, w, orfac, z, iz, jz, descz, work, lwork, iwork, liwork, ifail, iclustr,gap, info)

call pdsyevx(jobz, range, uplo, n, a, ia, ja, desca, vl, vu, il, iu, abstol,m, nz, w, orfac, z, iz, jz, descz, work, lwork, iwork, liwork, ifail, iclustr,gap, info)

1852

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetricmatrix A by calling the recommended sequence of ScaLAPACK routines. Eigenvalues andeigenvectors can be selected by specifying either a range of values or a range of indices forthe desired eigenvalues.

Input Parameters



(global). CHARACTER*1. Must be 'N' or 'V'. Specifies if itis necessary to compute the eigenvectors:

jobz

If jobz ='N', then only eigenvalues are computed.If jobz ='V', then eigenvalues and eigenvectors arecomputed.

(global). CHARACTER*1. Must be 'A', 'V', or 'I'.rangeIf range = 'A', all eigenvalues will be found.If range = 'V', all eigenvalues in the half-open interval[vl, vu] will be found.If range = 'I', the eigenvalues with indices il throughiu will be found.

(global). CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix A is stored:If uplo = 'U', a stores the upper triangular part of A.If uplo = 'L', a stores the lower triangular part of A.


matrix A (n ≥ 0).

n

(local). REAL for pssyevxaDOUBLE PRECISION for pdsyevx.Block cyclic array of global dimension (n, n) and localdimension (lld_a, LOCc(ja+n-1)). On entry, thesymmetric matrix A.If uplo = 'U', only the upper triangular part of A is usedto define the elements of the symmetric matrix.If uplo = 'L', only the lower triangular part of A is usedto define the elements of the symmetric matrix.

1853



ia, ja


desca

(global)vl, vuREAL for pssyevxDOUBLE PRECISION for pdsyevx.If range = 'V', the lower and upper bounds of the interval

to be searched for eigenvalues; vl ≤ vu. Not referenced ifrange = 'A' or 'I'.

(global) INTEGER.il, iuIf range ='I', the indices of the smallest and largesteigenvalues to be returned.

Constraints: il ≥ 1

min(il,n) ≤ iu ≤ nNot referenced if range = 'A' or 'V'.

(global). REAL for pssyevxabstolDOUBLE PRECISION for pdsyevx.If jobz='V', setting abstol to p?lamch(context, 'U')yields the most orthogonal eigenvectors.The absolute error tolerance for the eigenvalues. Anapproximate eigenvalue is accepted as converged when itis determined to lie in an interval [a, b] of width less thanor equal toabstol + eps * max(|a|,|b|),where eps is the machine precision. If abstol is less thanor equal to zero, then eps*norm(T) will be used in its place,where norm(T) is the 1-norm of the tridiagonal matrixobtained by reducing A to tridiagonal form.Eigenvalues will be computed most accurately when abstolis set to twice the underflow threshold 2*p?lamch('S')not zero. If this routine returns with((mod(info,2).ne.0).or. * (mod(info/8,2).ne.0)),indicating that some eigenvalues or eigenvectors did notconverge, try setting abstol to 2*p?lamch('S').

1854


(global). REAL for pssyevxorfacDOUBLE PRECISION for pdsyevx.Specifies which eigenvectors should be reorthogonalized.Eigenvectors that correspond to eigenvalues which are withintol=orfac*norm(A)of each other are to bereorthogonalized. However, if the workspace is insufficient(see lwork), tol may be decreased until all eigenvectorsto be reorthogonalized can be stored in one process. Noreorthogonalization will be done if orfac equals zero. Adefault value of 1.0e-3 is used if orfac is negative. orfacshould be identical on all processes.


iz, jz

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix Z.descz(ctxt_)must equal desca(ctxt_).

descz

(local)workREAL for pssyevx.DOUBLE PRECISION for pdsyevx.Array, DIMENSION (lwork).

(local) INTEGER. The dimension of the array work.lworkSee below for definitions of variables used to define lwork.If no eigenvectors are requested (jobz = 'N'), then lwork

≥ 5*n + max(5*nn, NB*(np0 + 1)).If eigenvectors are requested (jobz = 'V'), then theamount of workspace required to guarantee that alleigenvectors are computed is:

lwork ≥ 5*n + max(5*nn, np0*mq0 + 2*NB*NB) +iceil(neig, NPROW*NPCOL)*nnThe computed eigenvectors may not be orthogonal if theminimal workspace is supplied and orfac is too small. Ifyou want to guarantee orthogonality (at the cost ofpotentially poor performance) you should add the followingto lwork:(clustersize-1)*n,

1855


where clustersize is the number of eigenvalues in thelargest cluster, where a cluster is defined as a set of closeeigenvalues:

{w(k),..., w(k+clustersize-1)| w(j+1) ≤ w(j)) +orfac*2*norm(A)},whereneig = number of eigenvectors requestednb = desca(mb_) = desca(nb_) = descz(mb_) =descz(nb_);nn = max(n, nb, 2);desca(rsrc_) = desca(nb_) = descz(rsrc_) =descz(csrc_) = 0;np0 = numroc(nn, nb, 0, 0, NPROW);mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)If lwork is too small to guarantee orthogonality, p?syevxattempts to maintain orthogonality in the clusters with thesmallest spacing between the eigenvalues.If lwork is too small to compute all the eigenvectorsrequested, no computation is performed and info= -23 isreturned.Note that when range='V', number of requestedeigenvectors are not known until the eigenvalues arecomputed. In this case and if lwork is large enough tocompute the eigenvalues, p?sygvx computes theeigenvalues and as many eigenvectors as possible.Relationship between workspace, orthogonality &performance:Greater performance can be achieved if adequate workspaceis provided. In some situations, performance can decreaseas the provided workspace increases above the workspaceamount shown below:

lwork ≥ max(lwork, 5*n + nsytrd_lwopt),where lwork, as defined previously, depends upon thenumber of eigenvectors requested, andnsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps +3)*nps;

1856


anb = pjlaenv(desca(ctxt_), 3, 'p?syttrd', 'L',0, 0, 0, 0);sqnpc = int(sqrt(dble(NPROW * NPCOL)));nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb);numroc is a ScaLAPACK tool functions;pjlaenv is a ScaLAPACK environmental inquiry functionMYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.For large n, no extra workspace is needed, however thebiggest boost in performance comes for small n, so it is wiseto provide the extra workspace (typically less than amegabyte per process).If clustersize > n/sqrt(NPROW*NPCOL), then providingenough space to compute all the eigenvectors orthogonallywill cause serious degradation in performance. At the limit(that is, clustersize = n-1) p?stein will perform nobetter than ?stein on single processor.For clustersize = n/sqrt(NPROW*NPCOL)reorthogonalizing all eigenvectors will increase the totalexecution time by a factor of 2 or more.For clustersize > n/sqrt(NPROW*NPCOL) execution timewill grow as the square of the cluster size, all other factorsremaining equal and assuming enough workspace. Lessworkspace means less reorthogonalization but fasterexecution.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Eachof these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.

(local) INTEGER. Workspace array.iwork

(local) INTEGER, dimension of iwork. liwork ≥ 6*nnpliwork

Where: nnp = max(n, NPROW*NPCOL + 1, 4)If liwork = -1, then liwork is global input and aworkspace query is assumed; the routine only calculatesthe minimum and optimal size for all work arrays. Each of

1857


these values is returned in the first entry of thecorresponding work array, and no error message is issuedby pxerbla.

Output Parameters

On exit, the lower triangle (if uplo = 'L') or the uppertriangle (if uplo = 'U')of A, including the diagonal, isoverwritten.

a

(global) INTEGER. The total number of eigenvalues found;

0 ≤ m ≤ n.

m

(global) INTEGER. Total number of eigenvectors computed.

0 ≤ nz ≤ m.

nz

The number of columns of z that are filled.

If jobz ≠ 'V', nz is not referenced.If jobz = 'V', nz = m unless the user supplies insufficientspace and p?syevx is not able to detect this beforebeginning computation. To get all the eigenvectorsrequested, the user must supply both sufficient space tohold the eigenvectors in z (m.le.descz(n_)) and sufficientworkspace to compute them. (See lwork). p?syevx isalways able to detect insufficient space without computationunless range.eq.'V'.

(global). REAL for pssyevxwDOUBLE PRECISION for pdsyevx.Array, DIMENSION (n). The first m elements contain theselected eigenvalues in ascending order.

(local). REAL for pssyevxzDOUBLE PRECISION for pdsyevx.Array, global dimension (n, n), local dimension (lld_z,LOCc(jz+n-1)).If jobz = 'V', then on normal exit the first m columns ofz contain the orthonormal eigenvectors of the matrixcorresponding to the selected eigenvalues. If an eigenvectorfails to converge, then that column of z contains the latestapproximation to the eigenvector, and the index of theeigenvector is returned in ifail.

1858


If jobz = 'N', then z is not referenced.

On exit, returns workspace adequate workspace to allowoptimal performance.

work(1)

On return, iwork(1) contains the amount of integerworkspace required.

iwork(1)

(global) INTEGER.ifailArray, DIMENSION (n).If jobz = 'V', then on normal exit, the first m elements ofifail are zero. If (mod(info,2). ne.0) on exit, thenifail contains the indices of the eigenvectors that failedto converge.If jobz = 'N', then ifail is not referenced.

(global) INTEGER. Array, DIMENSION (2*NPROW*NPCOL)iclustrThis array contains indices of eigenvectors correspondingto a cluster of eigenvalues that could not be reorthogonalizeddue to insufficient workspace (see lwork, orfac and info).Eigenvectors corresponding to clusters of eigenvaluesindexed iclustr(2*i-1) to iclustr(2*i), could notbe reorthogonalized due to lack of workspace. Hence theeigenvectors corresponding to these clusters may not beorthogonal. iclustr() is a zero terminated array.(iclustr(2*k).ne.0. and. iclustr(2*k+1).eq.0)if and only if k is the number of clusters.iclustr is not referenced if jobz = 'N'.

(global)gapREAL for pssyevxDOUBLE PRECISION for pdsyevx.Array, DIMENSION (NPROW*NPCOL)This array contains the gap between eigenvalues whoseeigenvectors could not be reorthogonalized. The outputvalues in this array correspond to the clusters indicated bythe array iclustr. As a result, the dot product betweeneigenvectors corresponding to the ith cluster may be ashigh as (C*n)/gap(i) where C is a small constant.

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0:

1859

If the i-th argument is an array and the j-entry had anillegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.If info > 0: if (mod(info,2).ne.0), then one or moreeigenvectors failed to converge. Their indices are stored inifail. Ensure abstol=2.0*p?lamch('U').If (mod(info/2,2).ne.0), then eigenvectors correspondingto one or more clusters of eigenvalues could not bereorthogonalized because of insufficient workspace.Theindices of the clusters are stored in the array iclustr.If (mod(info/4,2).ne.0), then space limit preventedp?syevxf rom computing all of the eigenvectors betweenvl and vu. The number of eigenvectors computed is returnedin nz.If (mod(info/8,2).ne.0), then p?stebz failed to computeeigenvalues. Ensure abstol=2.0*p?lamch('U').

p?heevxComputes selected eigenvalues and, optionally,eigenvectors of a Hermitian matrix.

Syntax

call pcheevx( jobz, range, uplo, n, a, ia, ja, desca, vl, vu, il, iu, abstol,m, nz, w, orfac, z, iz, jz, descz, work, lwork, rwork, lrwork, iwork, liwork,ifail, iclustr, gap, info )

call pzheevx( jobz, range, uplo, n, a, ia, ja, desca, vl, vu, il, iu, abstol,m, nz, w, orfac, z, iz, jz, descz, work, lwork, rwork, lrwork, iwork, liwork,ifail, iclustr, gap, info )

Description

This routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitianmatrix A by calling the recommended sequence of ScaLAPACK routines. Eigenvalues andeigenvectors can be selected by specifying either a range of values or a range of indices forthe desired eigenvalues.

1860


Input Parameters



(global). CHARACTER*1. Must be 'N' or 'V'.jobzSpecifies if it is necessary to compute the eigenvectors:If jobz ='N', then only eigenvalues are computed.If jobz ='V', then eigenvalues and eigenvectors arecomputed.

(global). CHARACTER*1. Must be 'A', 'V', or 'I'.rangeIf range = 'A', all eigenvalues will be found.If range = 'V', all eigenvalues in the half-open interval[vl, vu] will be found.If range = 'I', the eigenvalues with indices il throughiu will be found.

(global). CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the upper or lower triangular part of theHermitian matrix A is stored:If uplo = 'U', a stores the upper triangular part of A.If uplo = 'L', a stores the lower triangular part of A.


matrix A (n ≥ 0).

n

(local).aCOMPLEX for pcheevxDOUBLE COMPLEX for pzheevx.Block cyclic array of global dimension (n, n) and localdimension (lld_a, LOC c(ja+n-1)). On entry, theHermitian matrix A.If uplo = 'U', only the upper triangular part of A is usedto define the elements of the symmetric matrix.If uplo = 'L', only the lower triangular part of A is usedto define the elements of the Hermitian matrix.


ia, ja

1861


(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A. Ifdesca(ctxt_) is incorrect, p?heevx cannot guaranteecorrect error reporting

desca

(global)vl, vuREAL for pcheevxDOUBLE PRECISION for pzheevx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues; not referenced if range ='A' or 'I'.

(global)il, iuINTEGER. If range ='I', the indices of the smallest andlargest eigenvalues to be returned.Constraints:

il ≥ 1; min(il,n) ≤ iu ≤ n.Not referenced if range = 'A' or 'V'.

(global).abstolREAL for pcheevxDOUBLE PRECISION for pzheevx.If jobz='V', setting abstol to p?lamch(context, 'U')yields the most orthogonal eigenvectors.The absolute error tolerance for the eigenvalues. Anapproximate eigenvalue is accepted as converged when itis determined to lie in an interval [a, b] of width less thanor equal to abstol+eps*max(|a|,|b|), where eps is themachine precision. If abstol is less than or equal to zero,then eps*norm(T) will be used in its place, where norm(T)is the 1-norm of the tridiagonal matrix obtained by reducingA to tridiagonal form.Eigenvalues are computed most accurately when abstol isset to twice the underflow threshold 2*p?lamch('S'), notzero. If this routine returns with((mod(info,2).ne.0).or.(mod(info/8,2).ne.0)),indicating that some eigenvalues or eigenvectors did notconverge, try setting abstol to 2*p?lamch('S').

(global). REAL for pcheevxorfacDOUBLE PRECISION for pzheevx.

1862


Specifies which eigenvectors should be reorthogonalized.Eigenvectors that correspond to eigenvalues which are withintol=orfac*norm(A) of each other are to bereorthogonalized. However, if the workspace is insufficient(see lwork), tol may be decreased until all eigenvectorsto be reorthogonalized can be stored in one process. Noreorthogonalization will be done if orfac equals zero. Adefault value of 1.0e-3 is used if orfac is negative.orfac should be identical on all processes.


iz, jz

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix Z.descz( ctxt_) must equal desca( ctxt_ ).

descz

(local)workCOMPLEX for pcheevxDOUBLE COMPLEX for pzheevx.Array, DIMENSION (lwork).

(local). INTEGER. The dimension of the array work.lworkIf only eigenvalues are requested:

lwork ≥ n + max(nb*(np0 + 1), 3)If eigenvectors are requested:

lwork ≥ n + (np0+mq0+nb)*nbwith nq0 = numroc(nn, nb, 0, 0, NPCOL).

lwork ≥ 5*n + max(5*nn, np0*mq0+2*nb*nb) +iceil(neig, NPROW*NPCOL)*nnFor optimal performance, greater workspace is needed, thatis

lwork ≥ max(lwork, nhetrd_lwork)where lwork is as defined above, and nhetrd_lwork = n+ 2*(anb+1)*(4*nps+2) + (nps+1)*npsictxt = desca(ctxt_)anb = pjlaenv(ictxt, 3, 'pchettrd', 'L', 0, 0,0, 0)sqnpc = sqrt(dble(NPROW * NPCOL))

1863


nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Eachof these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.

(local)rworkREAL for pcheevxDOUBLE PRECISION for pzheevx.Workspace array, DIMENSION (lrwork).

(local) INTEGER. The dimension of the array work.lrworkSee below for definitions of variables used to define lwork.If no eigenvectors are requested (jobz = 'N'), then

lrwork ≥ 5*nn+4*n.If eigenvectors are requested (jobz = 'V'), then theamount of workspace required to guarantee that alleigenvectors are computed is:

lrwork ≥ 4*n + max(5*nn, np0*mq0+2*nb*nb) +iceil(neig, NPROW*NPCOL)*nnThe computed eigenvectors may not be orthogonal if theminimal workspace is supplied and orfac is too small. Ifyou want to guarantee orthogonality (at the cost ofpotentially poor performance) you should add the followingvalues to lrwork:(clustersize-1)*n,where clustersize is the number of eigenvalues in thelargest cluster, where a cluster is defined as a set of closeeigenvalues:

{w(k),..., w(k+clustersize-1)|w(j+1) ≤w(j)+orfac*2*norm(A)}.Variable definitions:neig = number of eigenvectors requested;nb = desca(mb_) = desca(nb_) = descz(mb_) =descz(nb_);nn = max(n, NB, 2);

1864


desca(rsrc_) = desca(nb_) = descz(rsrc_) =descz(csrc_) = 0;np0 = numroc(nn, nb, 0, 0, NPROW);mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)When lrwork is too small:If lwork is too small to guarantee orthogonality, p?heevxattempts to maintain orthogonality in the clusters with thesmallest spacing between the eigenvalues. If lwork is toosmall to compute all the eigenvectors requested, nocomputation is performed and info= -23 is returned. Notethat when range='V', p?heevx does not know how manyeigenvectors are requested until the eigenvalues arecomputed. Therefore, when range='V' and as long as lworkis large enough to allow p?heevx to compute theeigenvalues, p?heevx will compute the eigenvalues and asmany eigenvectors as it can.Relationship between workspace, orthogonality andperformance:

If clustersize ≥ n/sqrt(NPROW*NPCOL), then providingenough space to compute all the eigenvectors orthogonallywill cause serious degradation in performance. In the limit(that is, clustersize = n-1) p?stein will perform nobetter than ?stein on 1 processor.For clustersize = n/sqrt(NPROW*NPCOL)reorthogonalizing all eigenvectors will increase the totalexecution time by a factor of 2 or more.For clustersize > n/sqrt(NPROW*NPCOL) execution timewill grow as the square of the cluster size, all other factorsremaining equal and assuming enough workspace. Lessworkspace means less reorthogonalization but fasterexecution.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Eachof these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.

1865



(local) INTEGER, dimension of iwork.liwork

liwork ≥ 6*nnpWhere: nnp = max(n, NPROW*NPCOL+1, 4)If liwork = -1, then liwork is global input and aworkspace query is assumed; the routine only calculatesthe minimum and optimal size for all work arrays. Each ofthese values is returned in the first entry of thecorresponding work array, and no error message is issuedby pxerbla.

Output Parameters

On exit, the lower triangle (if uplo = 'L'), or the uppertriangle (if uplo = 'U') of A, including the diagonal, isoverwritten.

a

(global) INTEGER. The total number of eigenvalues found;

0 ≤ m ≤ n.

m

(global) INTEGER. Total number of eigenvectors computed.

0 ≤ nz ≤ m.

nz

The number of columns of z that are filled.

If jobz ≠ 'V', nz is not referenced.If jobz = 'V', nz = m unless the user supplies insufficientspace and p?heevx is not able to detect this beforebeginning computation. To get all the eigenvectorsrequested, the user must supply both sufficient space tohold the eigenvectors in z (m.le.descz(n_)) and sufficientworkspace to compute them. (See lwork). p?heevx isalways able to detect insufficient space without computationunless range.eq.'V'.

(global).wREAL for pcheevxDOUBLE PRECISION for pzheevx.Array, DIMENSION (n). The first m elements contain theselected eigenvalues in ascending order.

(local).z

1866


COMPLEX for pcheevxDOUBLE COMPLEX for pzheevx.Array, global dimension (n, n), local dimension (lld_z,LOCc(jz+n-1)).If jobz ='V', then on normal exit the first m columns of zcontain the orthonormal eigenvectors of the matrixcorresponding to the selected eigenvalues. If an eigenvectorfails to converge, then that column of z contains the latestapproximation to the eigenvector, and the index of theeigenvector is returned in ifail.If jobz = 'N', then z is not referenced.

On exit, returns workspace adequate workspace to allowoptimal performance.

work(1)

(local).rworkREAL for pcheevxDOUBLE PRECISION for pzheevx.Array, DIMENSION (lrwork). On return, rwork(1) containsthe optimal amount of workspace required for efficientexecution.If jobz='N' rwork(1) = optimal amount of workspacerequired to compute eigenvalues efficiently.If jobz='V' rwork(1) = optimal amount of workspacerequired to compute eigenvalues and eigenvectors efficientlywith no guarantee on orthogonality.If range='V', it is assumed that all eigenvectors may berequired.

(local)iwork(1)On return, iwork(1) contains the amount of integerworkspace required.

(global) INTEGER.ifailArray, DIMENSION (n).If jobz ='V', then on normal exit, the first m elements ofifail are zero. If (mod(info,2).ne.0) on exit, then ifailcontains the indices of the eigenvectors that failed toconverge.If jobz = 'N', then ifail is not referenced.

(global) INTEGER.iclustrArray, DIMENSION (2*NPROW*NPCOL).

1867


This array contains indices of eigenvectors correspondingto a cluster of eigenvalues that could not be reorthogonalizeddue to insufficient workspace (see lwork, orfac and info).Eigenvectors corresponding to clusters of eigenvaluesindexed iclustr(2*i-1) to iclustr(2*i), could not bereorthogonalized due to lack of workspace. Hence theeigenvectors corresponding to these clusters may not beorthogonal. iclustr() is a zero terminated array.(iclustr(2*k).ne.0. and. iclustr(2*k+1).eq.0) ifand only if k is the number of clusters. iclustr is notreferenced if jobz = 'N'.

(global)gapREAL for pcheevxDOUBLE PRECISION for pzheevx.Array, DIMENSION (NPROW*NPCOL)This array contains the gap between eigenvalues whoseeigenvectors could not be reorthogonalized. The outputvalues in this array correspond to the clusters indicated bythe array iclustr. As a result, the dot product betweeneigenvectors corresponding to the i-th cluster may be ashigh as (C*n)/gap(i) where C is a small constant.

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0:If the i-th argument is an array and the j-entry had anillegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.If info > 0:If (mod(info,2).ne.0), then one or more eigenvectorsfailed to converge. Their indices are stored in ifail. Ensureabstol=2.0*p?lamch('U')If (mod(info/2,2).ne.0), then eigenvectors correspondingto one or more clusters of eigenvalues could not bereorthogonalized because of insufficient workspace.Theindices of the clusters are stored in the array iclustr.

1868

If (mod(info/4,2).ne.0), then space limit preventedp?syevx from computing all of the eigenvectors betweenvl and vu. The number of eigenvectors computed is returnedin nz.If (mod(info/8,2).ne.0), then p?stebz failed to computeeigenvalues. Ensure abstol=2.0*p?lamch('U').

p?gesvdComputes the singular value decomposition of ageneral matrix, optionally computing the left and/orright singular vectors.

Syntax

call psgesvd( jobu, jobvt, m, n, a, ia, ja, desca, s, u, iu, ju, descu, vt,ivt, jvt, descvt, work, lwork, info )

call pdgesvd( jobu, jobvt, m, n, a, ia, ja, desca, s, u, iu, ju, descu, vt,ivt, jvt, descvt, work, lwork, info )

call pcgesvd( jobu, jobvt, m, n, a, ia, ja, desca, s, u, iu, ju, descu, vt,ivt, jvt, descvt, work, lwork, rwork, info )

call pzgesvd( jobu, jobvt, m, n, a, ia, ja, desca, s, u, iu, ju, descu, vt,ivt, jvt, descvt, work, lwork, rwork, info )

Description

This routine computes the singular value decomposition (SVD) of an m-by-n matrix A, optionallycomputing the left and/or right singular vectors. The SVD is written

A = U*Σ*VT,

where Σ is an m-by-n matrix that is zero except for its min(m, n) diagonal elements, U is an

m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σare the singular values of A and the columns of U and V are the corresponding right and leftsingular vectors, respectively. The singular values are returned in array s in decreasing orderand only the first min(m,n) columns of U and rows of vt = VT are computed.

Input Parameters

mp = number of local rows in A and U

1869


nq = number of local columns in A and VT

size = min(m, n)

sizeq = number of local columns in U

sizep = number of local rows in VT

(global). CHARACTER*1. Specifies options for computing allor part of the matrix U.

jobu

If jobu = 'V', the first size columns of U (the left singularvectors) are returned in the array u;If jobu ='N', no columns of U (no left singular vectors)arecomputed.

(global) CHARACTER*1.jobvtSpecifies options for computing all or part of the matrix VT.If jobvt = 'V', the first size rows of VT (the right singularvectors) are returned in the array vt;If jobvt = 'N', no rows of VT(no right singular vectors)are computed.

(global) INTEGER. The number of rows of the matrix A (m

≥ 0).

m

(global) INTEGER. The number of columns in A (n ≥ 0).n

(local). REAL for psgesvdaDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdBlock cyclic array, global dimension (m, n), local dimension(mp, nq).work(lwork) is a workspace array.


ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix U, respectively.

iu, ju

1870


(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix U.

descu

(global) INTEGER. The row and column indices in the globalarray vt indicating the first row and the first column of thesubmatrix VT, respectively.

ivt, jvt

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix VT.

descvt

(local). REAL for psgesvdworkDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdWorkspace array, dimension (lwork)

(local) INTEGER. The dimension of the array work;lworklwork > 2 + 6*sizeb + max(watobd, wbdtosvd),where sizeb = max(m, n), and watobd and wbdtosvdrefer, respectively, to the workspace required tobidiagonalize the matrix A and to go from the bidiagonalmatrix to the singular value decomposition U S VT.For watobd, the following holds:watobd = max(max(wp?lange,wp?gebrd),max(wp?lared2d, wp?lared1d)),where wp?lange, wp?lared1d, wp?lared2d, wp?gebrd arethe workspaces required respectively for the subprogramsp?lange, p?lared1d, p?lared2d, p?gebrd. Using thestandard notationmp = numroc(m, mb, MYROW, desca(ctxt_), NPROW),nq = numroc(n, nb, MYCOL, desca(lld_), NPCOL),the workspaces required for the above subprograms arewp?lange = mp,wp?lared1d = nq0,wp?lared2d = mp0,wp?gebrd = nb*(mp + nq + 1) + nq,where nq0 and mp0 refer, respectively, to the valuesobtained at MYCOL = 0 and MYROW = 0. In general, theupper limit for the workspace is given by a workspacerequired on processor (0,0):

watobd ≤ nb*(mp0 + nq0 + 1) + nq0.

1871


In case of a homogeneous process grid this upper limit canbe used as an estimate of the minimum workspace for everyprocessor.For wbdtosvd, the following holds:wbdtosvd = size*(wantu*nru + wantvt*ncvt) +max(w?bdsqr, max(wantu*wp?ormbrqln,wantvt*wp?ormbrprt)),wherewantu(wantvt) = 1, if left/right singular vectors are wanted,and wantu(wantvt) = 0, otherwise. w?bdsqr,wp?ormbrqln, and wp?ormbrprt refer respectively to theworkspace required for the subprograms ?bdsqr,p?ormbr(qln), and p?ormbr(prt), where qln and prt arethe values of the arguments vect, side, and trans in thecall to p?ormbr. nru is equal to the local number of rowsof the matrix U when distributed 1-dimensional "column" ofprocesses. Analogously, ncvt is equal to the local numberof columns of the matrix VT when distributed across1-dimensional "row" of processes. Calling the LAPACKprocedure ?bdsqr requiresw?bdsqr = max(1, 2*size + (2*size - 4)*max(wantu, wantvt))on every processor. Finally,wp?ormbrqln = max((nb*(nb-1))/2,(sizeq+mp)*nb)+nb*nb,wp?ormbrprt = max((mb*(mb-1))/2,(sizep+nq)*mb)+mb*mb,If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumsize for the work array. The required workspace is returnedas the first element of work and no error message is issuedby pxerbla.

REAL for psgesvdrworkDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdWorkspace array, dimension (1 + 4*sizeb)

1872


Output Parameters

On exit, the contents of a are destroyed.a

(global). REAL for psgesvdsDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdArray, DIMENSION (size).

Contains the singular values of A sorted so that s(i) ≥s(i+1).

(local). REAL for psgesvduDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdlocal dimension (mp, sizeq), global dimension (m, size)If jobu = 'V', u contains the first min(m, n) columns of U.If jobu = 'N' or 'O', u is not referenced.

(local). REAL for psgesvdvtDOUBLE PRECISION for pdgesvdCOMPLEX for pcgesvdCOMPLEX*16 for pzgesvdlocal dimension (sizep, nq), global dimension (size, n)If jobvt = 'V', vt contains the first size rows of VTif jobu= 'N', vt is not referenced.


work

On exit, if info = 0, then rwork(1) returns the requiredsize of rwork.

rwork

(global) INTEGER.infoIf info = 0, the execution is successful.If info < 0, If info = -i, the ith parameter had an illegalvalue.If info > 0 i, then if ?bdsqr did not converge,If info = min(m,n) + 1, then p?gesvd has detectedheterogeneity by finding that eigenvalues were not identicalacross the process grid. In this case, the accuracy of theresults from p?gesvd cannot be guaranteed.

1873

p?sygvxComputes selected eigenvalues and, optionally,eigenvectors of a real generalized symmetricdefinite eigenproblem.

Syntax

call pssygvx(ibtype, jobz, range, uplo, n, a, ia, ja, desca, b, ib, jb, descb,vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, descz, work, lwork, iwork,liwork, ifail, iclustr, gap, info)

call pdsygvx(ibtype, jobz, range, uplo, n, a, ia, ja, desca, b, ib, jb, descb,vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, descz, work, lwork, iwork,liwork, ifail, iclustr, gap, info)

Description


sub(A)*x = λ*sub(B)*x, sub(A) sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.

Here x denotes eigen vectors, λ (lambda) denotes eigenvalues, sub(A) denoting A(ia:ia+n-1,ja:ja+n-1) is assumed to symmetric, and sub(B) denoting B(ib:ib+n-1, jb:jb+n-1) isalso positive definite.

Input Parameters

(global) INTEGER. Must be 1 or 2 or 3.ibtypeSpecifies the problem type to be solved:If ibtype = 1, the problem type is sub(A)*x =lambda*sub(B)*x;If ibtype = 2, the problem type is sub(A)*sub(B)*x =lambda*x;If ibtype = 3, the problem type is sub(B)*sub(A)*x =lambda*x.

(global). CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz ='N', then compute eigenvalues only.If jobz ='V', then compute eigenvalues and eigenvectors.

(global). CHARACTER*1. Must be 'A' or 'V' or 'I'.range

1874


If range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalues in theinterval: [vl, vu]If range = 'I', the routine computes eigenvalues withindices il through iu.

(global). CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays a and b store the upper triangles ofsub(A) and sub (B);If uplo = 'L', arrays a and b store the lower triangles ofsub(A) and sub (B).

(global). INTEGER. The order of the matrices sub(A) and

sub (B), n ≥ 0.

n

(local)aREAL for pssygvxDOUBLE PRECISION for pdsygvx.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the n-by-n symmetric distributed matrixsub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix.


ia, ja

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix A. Ifdesca(ctxt_) is incorrect, p?sygvx cannot guaranteecorrect error reporting.

desca

(local). REAL for pssygvxbDOUBLE PRECISION for pdsygvx.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+n-1)). On entry, this array containsthe local pieces of the n-by-n symmetric distributed matrixsub(B).

1875


If uplo = 'U', the leading n-by-n upper triangular part ofsub(B) contains the upper triangular part of the matrix.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix.


ib, jb

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix B. descb(ctxt_)must be equal to desca(ctxt_).

descb

(global)vl, vuREAL for pssygvxDOUBLE PRECISION for pdsygvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.If range = 'A' or 'I', vl and vu are not referenced.

(global)il, iuINTEGER.If range = 'I', the indices in ascending order of thesmallest and largest eigenvalues to be returned. Constraint:

il ≥ 1, min(il, n)≤ iu ≤ nIf range = 'A' or 'V', il and iu are not referenced.

(global)abstolREAL for pssygvxDOUBLE PRECISION for pdsygvx.If jobz='V', setting abstol to p?lamch(context, 'U')yields the most orthogonal eigenvectors.The absolute error tolerance for the eigenvalues. Anapproximate eigenvalue is accepted as converged when itis determined to lie in an interval [a,b] of width less thanor equal toabstol + eps*max(|a|,|b|),where eps is the machine precision. If abstol is less thanor equal to zero, then eps*norm(T) will be used in its place,where norm(T) is the 1-norm of the tridiagonal matrixobtained by reducing A to tridiagonal form.

1876


Eigenvalues will be computed most accurately when abstolis set to twice the underflow threshold 2*p?lamch('S')not zero. If this routine returns with((mod(info,2).ne.0).or.*(mod(info/8,2).ne.0)),indicating that some eigenvalues or eigenvectors did notconverge, try setting abstol to 2*p?lamch('S').

(global).orfacREAL for pssygvxDOUBLE PRECISION for pdsygvx.Specifies which eigenvectors should be reorthogonalized.Eigenvectors that correspond to eigenvalues which are withintol=orfac*norm(A) of each other are to bereorthogonalized. However, if the workspace is insufficient(see lwork), tol may be decreased until all eigenvectorsto be reorthogonalized can be stored in one process. Noreorthogonalization will be done if orfac equals zero. Adefault value of 1.0e-3 is used if orfac is negative. orfacshould be identical on all processes.


iz, jz

(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix Z.descz(ctxt_)must equal desca(ctxt_).

descz

(local)workREAL for pssygvxDOUBLE PRECISION for pdsygvx.Workspace array, dimension (lwork)

(local) INTEGER.lworkDimension of the array work. See below for definitions ofvariables used to define lwork.If no eigenvectors are requested (jobz = 'N'), then lwork

≥ 5*n + max(5*nn, NB*(np0 + 1)).If eigenvectors are requested (jobz = 'V'), then theamount of workspace required to guarantee that alleigenvectors are computed is:

1877


lwork ≥ 5*n + max(5*nn, np0*mq0 + 2*nb*nb) +iceil(neig, NPROW*NPCOL)*nn.The computed eigenvectors may not be orthogonal if theminimal workspace is supplied and orfac is too small. Ifyou want to guarantee orthogonality at the cost ofpotentially poor performance you should add the followingto lwork:(clustersize-1)*n,where clustersize is the number of eigenvalues in thelargest cluster, where a cluster is defined as a set of closeeigenvalues:

{w(k),..., w(k+clustersize-1)|w(j+1) ≤ w(j) +orfac*2*norm(A)}Variable definitions:neig = number of eigenvectors requested,nb = desca(mb_) = desca(nb_) = descz(mb_) =descz(nb_),nn = max(n, nb, 2),desca(rsrc_) = desca(nb_) = descz(rsrc_) =descz(csrc_) = 0,np0 = numroc(nn, nb, 0, 0, NPROW),mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)If lwork is too small to guarantee orthogonality, p?syevxattempts to maintain orthogonality in the clusters with thesmallest spacing between the eigenvalues.If lwork is too small to compute all the eigenvectorsrequested, no computation is performed and info= -23 isreturned.Note that when range='V', number of requestedeigenvectors are not known until the eigenvalues arecomputed. In this case and if lwork is large enough tocompute the eigenvalues, p?sygvx computes theeigenvalues and as many eigenvectors as possible.Greater performance can be achieved if adequate workspaceis provided. In some situations, performance can decreaseas the provided workspace increases above the workspaceamount shown below:

1878


lwork ≥ max(lwork, 5*n + nsytrd_lwopt,nsygst_lwopt), wherelwork, as defined previously, depends upon the number ofeigenvectors requested, andnsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) +(nps+3)*npsnsygst_lwopt = 2*np0*nb + nq0*nb + nb*nbanb = pjlaenv(desca(ctxt_), 3, p?syttrd ', 'L',0, 0, 0, 0)sqnpc = int(sqrt(dble(NPROW * NPCOL)))nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)NB = desca(mb_)np0 = numroc(n, nb, 0, 0, NPROW)nq0 = numroc(n, nb, 0, 0, NPCOL)numroc is a ScaLAPACK tool functions;pjlaenv is a ScaLAPACK environmental inquiry functionMYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.For large n, no extra workspace is needed, however thebiggest boost in performance comes for small n, so it is wiseto provide the extra workspace (typically less than aMegabyte per process).

If clustersize ≥ n/sqrt(NPROW*NPCOL), then providingenough space to compute all the eigenvectors orthogonallywill cause serious degradation in performance. At the limit(that is, clustersize = n-1) p?stein will perform nobetter than ?stein on a single processor.For clustersize = n/sqrt(NPROW*NPCOL)reorthogonalizing all eigenvectors will increase the totalexecution time by a factor of 2 or more.For clustersize > n/sqrt(NPROW*NPCOL) execution timewill grow as the square of the cluster size, all other factorsremaining equal and assuming enough workspace. Lessworkspace means less reorthogonalization but fasterexecution.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Each

1879


of these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.



liwork ≥ 6*nnpWhere:nnp = max(n, NPROW*NPCOL + 1, 4)If liwork = -1, then liwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit,aIf jobz = 'V', and if info = 0, sub(A) contains thedistributed matrix Z of eigenvectors. The eigenvectors arenormalized as follows:for ibtype = 1 or 2, ZT*sub(B)*Z = i;for ibtype = 3, ZT*inv(sub(B))*Z = i.If jobz = 'N', then on exit the upper triangle (if uplo='U')or the lower triangle (if uplo='L') of sub(A), including thediagonal, is destroyed.

On exit, if info ≤ n, the part of sub(B) containing thematrix is overwritten by the triangular factor U or L fromthe Cholesky factorization sub(B) = UT*U or sub(B) =L*LT.

b

(global) INTEGER. The total number of eigenvalues found,

0 ≤ m ≤ n.

m

(global) INTEGER.nz

Total number of eigenvectors computed. 0 ≤ nz ≤ m. Thenumber of columns of z that are filled.

If jobz ≠ 'V', nz is not referenced.

1880


If jobz = 'V', nz = m unless the user supplies insufficientspace and p?sygvx is not able to detect this beforebeginning computation. To get all the eigenvectorsrequested, the user must supply both sufficient space tohold the eigenvectors in z (m.le.descz(n_)) and sufficientworkspace to compute them. (See lwork below.) p?sygvxis always able to detect insufficient space withoutcomputation unless range.eq.'V'.

(global)wREAL for pssygvxDOUBLE PRECISION for pdsygvx.Array, DIMENSION (n). On normal exit, the first m entriescontain the selected eigenvalues in ascending order.

(local).zREAL for pssygvxDOUBLE PRECISION for pdsygvx.global dimension (n, n), local dimension (lld_z,LOCc(jz+n-1)).If jobz = 'V', then on normal exit the first m columns ofz contain the orthonormal eigenvectors of the matrixcorresponding to the selected eigenvalues. If an eigenvectorfails to converge, then that column of z contains the latestapproximation to the eigenvector, and the index of theeigenvector is returned in ifail.If jobz = 'N', then z is not referenced.

If jobz='N' work(1) = optimal amount of workspacerequired to compute eigenvalues efficiently

work

If jobz = 'V' work(1) = optimal amount of workspacerequired to compute eigenvalues and eigenvectors efficientlywith no guarantee on orthogonality.If range='V', it is assumed that all eigenvectors may berequired.

(global) INTEGER.ifailArray, DIMENSION (n).ifail provides additional information when info.ne.0

1881


If (mod(info/16,2).ne.0) then ifail(1) indicates theorder of the smallest minor which is not positive definite. If(mod(info,2).ne.0) on exit, then ifail contains theindices of the eigenvectors that failed to converge.If neither of the above error conditions hold and jobz ='V', then the first m elements of ifail are set to zero.

(global) INTEGER.iclustrArray, DIMENSION (2*NPROW*NPCOL). This array containsindices of eigenvectors corresponding to a cluster ofeigenvalues that could not be reorthogonalized due toinsufficient workspace (see lwork, orfac and info).Eigenvectors corresponding to clusters of eigenvaluesindexed iclustr(2*i-1) to iclustr(2*i), could not bereorthogonalized due to lack of workspace. Hence theeigenvectors corresponding to these clusters may not beorthogonal. iclustr() is a zero terminated array.(iclustr(2*k).ne.0.and. iclustr(2*k+1).eq.0) ifand only if k is the number of clusters iclustr is notreferenced if jobz = 'N'.

(global)gapREAL for pssygvxDOUBLE PRECISION for pdsygvx.Array, DIMENSION (NPROW*NPCOL). This array contains thegap between eigenvalues whose eigenvectors could not bereorthogonalized. The output values in this array correspondto the clusters indicated by the array iclustr. As a result,the dot product between eigenvectors corresponding to thei-th cluster may be as high as (C*n)/gap(i), where C isa small constant.

(global) INTEGER.infoIf info = 0, the execution is successful.If info <0: the i-th argument is an array and the j-entryhad an illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.If info > 0:If (mod(info,2).ne.0), then one or more eigenvectorsfailed to converge. Their indices are stored in ifail.

1882

If (mod(info,2,2).ne.0), then eigenvectors correspondingto one or more clusters of eigenvalues could not bereorthogonalized because of insufficient workspace. Theindices of the clusters are stored in the array iclustr.If (mod(info/4,2).ne.0), then space limit preventedp?sygvx from computing all of the eigenvectors betweenvl and vu. The number of eigenvectors computed is returnedin nz.If (mod(info/8,2).ne.0), then p?stebz failed to computeeigenvalues.If (mod(info/16,2).ne.0), then B was not positivedefinite. ifail(1) indicates the order of the smallest minorwhich is not positive definite.

p?hegvxComputes selected eigenvalues and, optionally,eigenvectors of a complex generalized Hermitiandefinite eigenproblem.

Syntax

call pchegvx(ibtype, jobz, range, uplo, n, a, ia, ja, desca, b, ib, jb, descb,vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, descz, work, lwork, rwork,lrwork, iwork, liwork, ifail, iclustr, gap, info)

call pzhegvx(ibtype, jobz, range, uplo, n, a, ia, ja, desca, b, ib, jb, descb,vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, descz, work, lwork, rwork,lrwork, iwork, liwork, ifail, iclustr, gap, info)

Description


sub(A)*x = λ*sub(B)*x, sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.

Here sub (A) denoting A(ia:ia+n-1, ja:ja+n-1) and sub(B) are assumed to be Hermitianand sub(B) denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.

1883


Input Parameters

(global) INTEGER. Must be 1 or 2 or 3.ibtypeSpecifies the problem type to be solved:If ibtype = 1, the problem type issub(A)*x = lambda*sub(B)*x;If ibtype = 2, the problem type issub(A)*sub(B)*x = lambda*x;If ibtype = 3, the problem type issub(B)*sub(A)*x = lambda*x.

(global). CHARACTER*1. Must be 'N' or 'V'.jobzIf jobz ='N', then compute eigenvalues only.If jobz ='V', then compute eigenvalues and eigenvectors.

(global). CHARACTER*1. Must be 'A' or 'V' or 'I'.rangeIf range = 'A', the routine computes all eigenvalues.If range = 'V', the routine computes eigenvalues in theinterval: [vl, vu]If range = 'I', the routine computes eigenvalues withindices il through iu.

(global). CHARACTER*1. Must be 'U' or 'L'.uploIf uplo = 'U', arrays a and b store the upper triangles ofsub(A) and sub (B);If uplo = 'L', arrays a and b store the lower triangles ofsub(A) and sub (B).

(global). INTEGER.n

The order of the matrices sub(A) and sub (B) (n ≥ 0).

(local)aCOMPLEX for pchegvxDOUBLE COMPLEX for pzhegvx.Pointer into the local memory to an array of dimension(lld_a, LOCc(ja+n-1)). On entry, this array containsthe local pieces of the n-by-n Hermitian distributed matrixsub(A). If uplo = 'U', the leading n-by-n upper triangularpart of sub(A) contains the upper triangular part of thematrix. If uplo = 'L', the leading n-by-n lower triangularpart of sub(A) contains the lower triangular part of thematrix.

1884


(global) INTEGER.ia, jaThe row and column indices in the global array a indicatingthe first row and the first column of the submatrix A,respectively.

(global and local) INTEGER array, dimension (dlen_).descaThe array descriptor for the distributed matrix A. Ifdesca(ctxt_) is incorrect, p?hegvx cannot guaranteecorrect error reporting.

(local).bCOMPLEX for pchegvxDOUBLE COMPLEX for pzhegvx.Pointer into the local memory to an array of dimension(lld_b, LOCc(jb+n-1)). On entry, this array containsthe local pieces of the n-by-n Hermitian distributed matrixsub(B).If uplo = 'U', the leading n-by-n upper triangular part ofsub(B) contains the upper triangular part of the matrix.If uplo = 'L', the leading n-by-n lower triangular part ofsub(B) contains the lower triangular part of the matrix.

(global) INTEGER.ib, jbThe row and column indices in the global array b indicatingthe first row and the first column of the submatrix B,respectively.

(global and local) INTEGER array, dimension (dlen_).descbThe array descriptor for the distributed matrix B.descb(ctxt_) must be equal to desca(ctxt_).

(global)vl, vuREAL for pchegvxDOUBLE PRECISION for pzhegvx.If range = 'V', the lower and upper bounds of the intervalto be searched for eigenvalues.If range = 'A' or 'I', vl and vu are not referenced.

(global)il, iuINTEGER.If range = 'I', the indices in ascending order of thesmallest and largest eigenvalues to be returned. Constraint:

il ≥ 1, min(il, n) ≤ iu ≤ n

1885



(global)abstolREAL for pchegvxDOUBLE PRECISION for pzhegvx.If jobz='V', setting abstol to p?lamch(context, 'U')yields the most orthogonal eigenvectors.The absolute error tolerance for the eigenvalues. Anapproximate eigenvalue is accepted as converged when itis determined to lie in an interval [a,b] of width less thanor equal toabstol + eps*max(|a|,|b|),where eps is the machine precision. If abstol is less thanor equal to zero, then eps*norm(T) will be used in its place,where norm(T) is the 1-norm of the tridiagonal matrixobtained by reducing A to tridiagonal form.Eigenvalues will be computed most accurately when abstolis set to twice the underflow threshold 2*p?lamch('S')not zero. If this routine returns with((mod(info,2).ne.0).or. * (mod(info/8,2).ne.0)),indicating that some eigenvalues or eigenvectors did notconverge, try setting abstol to 2*p?lamch('S').

(global).orfacREAL for pchegvxDOUBLE PRECISION for pzhegvx.Specifies which eigenvectors should be reorthogonalized.Eigenvectors that correspond to eigenvalues which are withintol=orfac*norm(A) of each other are to bereorthogonalized. However, if the workspace is insufficient(see lwork), tol may be decreased until all eigenvectorsto be reorthogonalized can be stored in one process. Noreorthogonalization will be done if orfac equals zero. Adefault value of 1.0E-3 is used if orfac is negative. orfacshould be identical on all processes.


iz, jz

1886


(global and local) INTEGER array, dimension (dlen_). Thearray descriptor for the distributed matrix Z.descz( ctxt_) must equal desca( ctxt_ ).

descz

(local)workCOMPLEX for pchegvxDOUBLE COMPLEX for pzhegvx.Workspace array, dimension (lwork)

(local).lworkINTEGER. The dimension of the array work.If only eigenvalues are requested:

lwork ≥ n+ max(NB*(np0 + 1), 3)If eigenvectors are requested:

lwork ≥ n + (np0+ mq0 + NB)*NBwith nq0 = numroc(nn, NB, 0, 0, NPCOL).For optimal performance, greater workspace is needed, thatis

lwork ≥ max(lwork, n, nhetrd_lwopt, nhegst_lwopt)where lwork is as defined above, andnhetrd_lwork = 2*(anb+1)*(4*nps+2) + (nps +1)*nps;nhegst_lwopt = 2*np0*nb + nq0*nb + nb*nbnb = desca(mb_)np0 = numroc(n, nb, 0, 0, NPROW)nq0 = numroc(n, nb, 0, 0, NPCOL)ictxt = desca(ctxt_)anb = pjlaenv(ictxt, 3, 'p?hettrd', 'L', 0, 0,0, 0)sqnpc = sqrt(dble(NPROW * NPCOL))nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)numroc is a ScaLAPACK tool functions;pjlaenv is a ScaLAPACK environmental inquiry functionMYROW, MYCOL, NPROW and NPCOL can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Each

1887


of these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.

(local)rworkREAL for pchegvxDOUBLE PRECISION for pzhegvx.Workspace array, DIMENSION (lrwork).

(local) INTEGER. The dimension of the array rwork.lrworkSee below for definitions of variables used to define lrwork.If no eigenvectors are requested (jobz = 'N'), then

lrwork ≥ 5*nn+4*nIf eigenvectors are requested (jobz = 'V'), then theamount of workspace required to guarantee that alleigenvectors are computed is:

lrwork ≥ 4*n + max(5*nn, np0*mq0)+ iceil(neig,NPROW*NPCOL)*nnThe computed eigenvectors may not be orthogonal if theminimal workspace is supplied and orfac is too small. Ifyou want to guarantee orthogonality (at the cost ofpotentially poor performance) you should add the followingvalue to lrwork:(clustersize-1)*n,where clustersize is the number of eigenvalues in thelargest cluster, where a cluster is defined as a set of closeeigenvalues:

{w(k),..., w(k+clustersize-1)| w(j+1) ≤w(j)+orfac*2*norm(A)}Variable definitions:neig = number of eigenvectors requested;nb = desca(mb_) = desca(nb_) = descz(mb_) =descz(nb_);nn = max(n, nb, 2);desca(rsrc_) = desca(nb_) = descz(rsrc_) =descz(csrc_) = 0 ;np0 = numroc(nn, nb, 0, 0, NPROW);mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);iceil(x, y) is a ScaLAPACK function returning ceiling(x/y).

1888


When lrwork is too small:If lwork is too small to guarantee orthogonality, p?hegvxattempts to maintain orthogonality in the clusters with thesmallest spacing between the eigenvalues.If lwork is too small to compute all the eigenvectorsrequested, no computation is performed and info= -25 isreturned. Note that when range='V', p?hegvx does notknow how many eigenvectors are requested until theeigenvalues are computed. Therefore, when range='V' andas long as lwork is large enough to allow p?hegvx tocompute the eigenvalues, p?hegvx will compute theeigenvalues and as many eigenvectors as it can.Relationship between workspace, orthogonality &performance:If clustersize > n/sqrt(NPROW*NPCOL), then providingenough space to compute all the eigenvectors orthogonallywill cause serious degradation in performance. In the limit(that is, clustersize = n-1) p?stein will perform nobetter than ?stein on 1 processor.For clustersize = n/sqrt(NPROW*NPCOL)reorthogonalizing all eigenvectors will increase the totalexecution time by a factor of 2 or more.For clustersize > n/sqrt(NPROW*NPCOL) execution timewill grow as the square of the cluster size, all other factorsremaining equal and assuming enough workspace. Lessworkspace means less reorthogonalization but fasterexecution.If lwork = -1, then lrwork is global input and a workspacequery is assumed; the routine only calculates the sizerequired for optimal performance for all work arrays. Eachof these values is returned in the first entry of thecorresponding work arrays, and no error message is issuedby pxerbla.



liwork ≥ 6*nnpWhere: nnp = max(n, NPROW*NPCOL + 1, 4)

1889


Output Parameters

On exit, if jobz = 'V', then if info = 0, sub(A) containsthe distributed matrix Z of eigenvectors.

a

The eigenvectors are normalized as follows:If ibtype = 1 or 2, then ZH*sub(B)*Z = i;If ibtype = 3, then ZH*inv(sub(B))*Z = i.If jobz = 'N', then on exit the upper triangle (if uplo='U')or the lower triangle (if uplo='L') of sub(A), including thediagonal, is destroyed.

On exit, if info ≤ n, the part of sub(B) containing thematrix is overwritten by the triangular factor U or L fromthe Cholesky factorization sub(B) = UH*U, or sub(B) = L*LH.

b

(global) INTEGER. The total number of eigenvalues found,

0 ≤ m ≤ n.

m

(global) INTEGER. Total number of eigenvectors computed.0 < nz < m. The number of columns of z that are filled.

nz

If jobz ≠ 'V', nz is not referenced.If jobz = 'V', nz = m unless the user supplies insufficientspace and p?hegvx is not able to detect this beforebeginning computation. To get all the eigenvectorsrequested, the user must supply both sufficient space tohold the eigenvectors in z (m. le. descz(n_)) andsufficient workspace to compute them. (See lwork below.)The routine p?hegvx is always able to detect insufficientspace without computation unless range = 'V'.

(global)wREAL for pchegvxDOUBLE PRECISION for pzhegvx.

1890

Array, DIMENSION (n). On normal exit, the first m entriescontain the selected eigenvalues in ascending order.

(local).zCOMPLEX for pchegvxDOUBLE COMPLEX for pzhegvx.global dimension (n, n), local dimension (lld_z,LOCc(jz+n-1)).If jobz = 'V', then on normal exit the first m columns ofz contain the orthonormal eigenvectors of the matrixcorresponding to the selected eigenvalues. If an eigenvectorfails to converge, then that column of z contains the latestapproximation to the eigenvector, and the index of theeigenvector is returned in ifail.If jobz = 'N', then z is not referenced.

On exit, work(1) returns the optimal amount of workspace.work

On exit, rwork(1) contains the amount of workspacerequired for optimal efficiency

rwork

If jobz='N' rwork(1) = optimal amount of workspacerequired to compute eigenvalues efficientlyIf jobz='V' rwork(1) = optimal amount of workspacerequired to compute eigenvalues and eigenvectors efficientlywith no guarantee on orthogonality.If range='V', it is assumed that all eigenvectors may berequired when computing optimal workspace.

(global) INTEGER.ifailArray, DIMENSION (n).ifail provides additional information when info.ne.0If (mod(info/16,2).ne.0), then ifail(1) indicates theorder of the smallest minor which is not positive definite.If (mod(info,2).ne.0) on exit, then ifail(1) containsthe indices of the eigenvectors that failed to converge.If neither of the above error conditions are held, and jobz= 'V', then the first m elements of ifail are set to zero.

(global) INTEGER.iclustrArray, DIMENSION (2*NPROW*NPCOL). This array containsindices of eigenvectors corresponding to a cluster ofeigenvalues that could not be reorthogonalized due to

1891


insufficient workspace (see lwork, orfac and info).Eigenvectors corresponding to clusters of eigenvaluesindexed iclustr(2*i-1) to iclustr(2*i), could not bereorthogonalized due to lack of workspace. Hence theeigenvectors corresponding to these clusters may not beorthogonal.iclustr() is a zero terminated array.(iclustr(2*k).ne.0.and.clustr(2*k+1).eq.0) if andonly if k is the number of clusters.iclustr is not referenced if jobz = 'N'.

(global)gapREAL for pchegvxDOUBLE PRECISION for pzhegvx.Array, DIMENSION (NPROW*NPCOL).This array contains the gap between eigenvalues whoseeigenvectors could not be reorthogonalized. The outputvalues in this array correspond to the clusters indicated bythe array iclustr. As a result, the dot product betweeneigenvectors corresponding to the i-th cluster may be ashigh as (C*n)/gap(i), where C is a small constant.

(global) INTEGER.infoIf info = 0, the execution is successful.If info <0: the i-th argument is an array and the j-entryhad an illegal value, then info = -(i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.If info > 0:If (mod(info,2).ne.0), then one or more eigenvectorsfailed to converge. Their indices are stored in ifail.If (mod(info,2,2).ne.0), then eigenvectors correspondingto one or more clusters of eigenvalues could not bereorthogonalized because of insufficient workspace. Theindices of the clusters are stored in the array iclustr.If (mod(info/4,2).ne.0), then space limit preventedp?sygvx from computing all of the eigenvectors betweenvl and vu. The number of eigenvectors computed is returnedin nz.

1892

If (mod(info/8,2).ne.0), then p?stebz failed to computeeigenvalues.If (mod(info/16,2).ne.0), then B was not positivedefinite. ifail(1) indicates the order of the smallest minorwhich is not positive definite.

1893


7ScaLAPACK Auxiliary and UtilityRoutines

This chapter describes the Intel® Math Kernel Library implementation of ScaLAPACK Auxiliary Routinesand Utility Functions and Routines. The library includes routines for both real and complex data.

NOTE. ScaLAPACK routines are provided with Intel® Cluster MKL product only which is asuperset of Intel MKL.

Routine naming conventions, mathematical notation, and matrix storage schemes used for ScaLAPACKauxiliary and utility routines are the same as described in previous chapters. Some routines and functionsmay have combined character codes, such as sc or dz. For example, the routine pscsum1 uses a complexinput array and returns a real value.

Auxiliary RoutinesTable 7-1 ScaLAPACK Auxiliary Routines


Routine Name

Conjugates a complex vector.c,zp?lacgv

Finds the index of the element whose real part has maximumabsolute value (similar to the Level 1 PBLAS p?amax, but usingthe absolute value to the real part).

c,zp?max1

Finds the element with maximum real part absolute value andits corresponding global index.

c,z?combamax1

Forms the 1-norm of a complex vector similar to Level 1 PBLASp?asum, but using the true absolute value.

sc,dzp?sum1

Computes an LU factorization of a general tridiagonal matrixwith no pivoting. The routine is called by p?dbtrs.

s,d,c,zp?dbtrsv

Computes an LU factorization of a general band matrix, usingpartial pivoting with row interchanges. The routine is called byp?dttrs.

s,d,c,zp?dttrsv

1895


Routine Name

Reduces a general rectangular matrix to real bidiagonal formby an orthogonal/unitary transformation (unblockedalgorithm).

s,d,c,zp?gebd2

Reduces a general matrix to upper Hessenberg form by anorthogonal/unitary similarity transformation (unblockedalgorithm).

s,d,c,zp?gehd2

Computes an LQ factorization of a general rectangular matrix(unblocked algorithm).

s,d,c,zp?gelq2

Computes a QL factorization of a general rectangular matrix(unblocked algorithm).

s,d,c,zp?geql2

Computes a QR factorization of a general rectangular matrix(unblocked algorithm).

s,d,c,zp?geqr2

Computes an RQ factorization of a general rectangular matrix(unblocked algorithm).

s,d,c,zp?gerq2

Computes an LU factorization of a general matrix, usingpartial pivoting with row interchanges (local blockedalgorithm).

s,d,c,zp?getf2

Reduces the first nb rows and columns of a generalrectangular matrix A to real bidiagonal form by anorthogonal/unitary transformation, and returns auxiliarymatrices that are needed to apply the transformation to theunreduced part of A.

s,d,c,zp?labrd

Estimates the 1-norm of a square matrix, using the reversecommunication for evaluating matrix-vector products.

s,d,c,zp?lacon

Looks for two consecutive small subdiagonal elements.s,dp?laconsb

Copies all or part of a distributed matrix to anotherdistributed matrix.

s,d,c,zp?lacp2

Copies from a global parallel array into a local replicatedarray or vice versa.

s,dp?lacp3

1896



Routine Name

Copies all or part of one two-dimensional array to another.s,d,c,zp?lacpy

Moves the eigenvectors from where they are computed toScaLAPACK standard block cyclic array.

s,d,c,zp?laevswp

Reduces the first nb columns of a general rectangular matrixA so that elements below the kth subdiagonal are zero, byan orthogonal/unitary transformation, and returns auxiliarymatrices that are needed to apply the transformation to theunreduced part of A.

s,d,c,zp?lahrd

Exploits IEEE arithmetic to accelerate the computations ofeigenvalues. (C interface function).

s,d,c,zp?laiect

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element,of a general rectangular matrix.

s,d,c,zp?lange

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element,of an upper Hessenberg matrix.

s,d,c,zp?lanhs

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any elementof a real symmetric or complex Hermitian matrix.

s,d,c,z/c,zp?lansy, p?lanhe

Returns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of any element,of a triangular matrix.

s,d,c,zp?lantr

Applies a permutation matrix to a general distributed matrix,resulting in row or column pivoting.

s,d,c,zp?lapiv

Scales a general rectangular matrix, using row and columnscaling factors computed by p?geequ.

s,d,c,zp?laqge

Scales a symmetric/Hermitian matrix, using scaling factorscomputed by p?poequ.

s,d,c,zp?laqsy

1897

ScaLAPACK Auxiliary and Utility Routines 7


Routine Name

Redistributes an array assuming that the input array bycolis distributed across rows and that all process columnscontain the same copy of bycol.

s,dp?lared1d

Redistributes an array assuming that the input array byrowis distributed across columns and that all process rowscontain the same copy of byrow .

s,dp?lared2d

Applies an elementary reflector to a general rectangularmatrix.

s,d,c,zp?larf

Applies a block reflector or its transpose/conjugate-transposeto a general rectangular matrix.

s,d,c,zp?larfb

Applies the conjugate transpose of an elementary reflectorto a general matrix.

c,zp?larfc

Generates an elementary reflector (Householder matrix).s,d,c,zp?larfg

Forms the triangular vector T of a block reflector H=I-VTVHs,d,c,zp?larft

Applies an elementary reflector as returned by p?tzrzf toa general matrix.

s,d,c,zp?larz

Applies a block reflector or its transpose/conjugate-transposeas returned by p?tzrzf to a general matrix.

s,d,c,zp?larzb

Applies (multiplies by) the conjugate transpose of anelementary reflector as returned by p?tzrzf to a generalmatrix.

c,zp?larzc

Forms the triangular factor T of a block reflector H=I-VTVH

as returned by p?tzrzf.s,d,c,zp?larzt

Multiplies a general rectangular matrix by a real scalardefined as Cto/Cfrom.

s,d,c,zp?lascl

Initializes the off-diagonal elements of a matrix to α and the

diagonal elements to β.

s,d,c,zp?laset

1898



Routine Name

Looks for a small subdiagonal element from the bottom ofthe matrix that it can safely set to zero.

s,dp?lasmsub

Updates a sum of squares represented in scaled form.s,d,c,zp?lassq

Performs a series of row interchanges on a generalrectangular matrix.

s,d,c,zp?laswp

Computes the trace of a general square distributed matrix.s,d,c,zp?latra

Reduces the first nb rows and columns of asymmetric/Hermitian matrix A to real tridiagonal form byan orthogonal/unitary similarity transformation.

s,d,c,zp?latrd

Reduces an upper trapezoidal matrix to upper triangularform by means of orthogonal/unitary transformations.

s,d,c,zp?latrz

Computes the product UUH or LHL, where U and L are upperor lower triangular matrices (local unblocked algorithm).

s,d,c,zp?lauu2

Computes the product UUH or LHL, where U and L are upperor lower triangular matrices.

s,d,c,zp?lauum

Forms the Wilkinson transform.s,dp?lawil

Generates all or part of the orthogonal/unitary matrix Q froma QL factorization determined by p?geqlf (unblockedalgorithm).

s,d,c,zp?org2l/p?ung2l

Generates all or part of the orthogonal/unitary matrix Q froma QR factorization determined by p?geqrf (unblockedalgorithm).

s,d,c,zp?org2r/p?ung2r

Generates all or part of the orthogonal/unitary matrix Q froman LQ factorization determined by p?gelqf (unblockedalgorithm).

s,d,c,zp?orgl2/p?ungl2

Generates all or part of the orthogonal/unitary matrix Q froman RQ factorization determined by p?gerqf (unblockedalgorithm).

s,d,c,zp?orgr2/p?ungr2

1899



Routine Name

Multiplies a general matrix by the orthogonal/unitary matrixfrom a QL factorization determined by p?geqlf (unblockedalgorithm).

s,d,c,zp?orm2l/p?unm2l

Multiplies a general matrix by the orthogonal/unitary matrixfrom a QR factorization determined by p?geqrf (unblockedalgorithm).

s,d,c,zp?orm2r/p?unm2r

Multiplies a general matrix by the orthogonal/unitary matrixfrom an LQ factorization determined by p?gelqf (unblockedalgorithm).

s,d,c,zp?orml2/p?unml2

Multiplies a general matrix by the orthogonal/unitary matrixfrom an RQ factorization determined by p?gerqf (unblockedalgorithm).

s,d,c,zp?ormr2/p?unmr2

Solves a single triangular linear system via frontsolve orbacksolve where the triangular matrix is a factor of a bandedmatrix computed by p?pbtrf.

s,d,c,zp?pbtrsv

Solves a single triangular linear system via frontsolve orbacksolve where the triangular matrix is a factor of atridiagonal matrix computed by p?pttrf.

s,d,c,zp?pttrsv

Computes the Cholesky factorization of asymmetric/Hermitian positive definite matrix (local unblockedalgorithm).

s,d,c,zp?potf2

Multiplies a vector by the reciprocal of a real scalar.s,d,cs,zdp?rscl

Reduces a symmetric/Hermitian definite generalizedeigenproblem to standard form, using the factorizationresults obtained from p?potrf (local unblocked algorithm).

s,d,c,zp?sygs2/p?hegs2

Reduces a symmetric/Hermitian matrix to real symmetrictridiagonal form by an orthogonal/unitary similaritytransformation (local unblocked algorithm).

s,d,c,zp?sytd2/p?hetd2

Computes the inverse of a triangular matrix (local unblockedalgorithm).

s,d,c,zp?trti2

1900



Routine Name

Sends multiple shifts through a small (single node) matrixto maximize the number of bulges that can be sent through.

s,d?lamsh

Applies Householder reflectors to matrices on either theirrows or columns.

s,d?laref

Sorts eigenpairs by real and complex data types.s,d?lasorte

Sorts numbers in increasing or decreasing order.s,d?lasrt2

Computes the eigenvectors corresponding to specifiedeigenvalues of a real symmetric tridiagonal matrix, usinginverse iteration.

s,d?stein2

Computes an LU factorization of a general band matrix withno pivoting (local unblocked algorithm).

s,d,c,z?dbtf2

Computes an LU factorization of a general band matrix withno pivoting (local blocked algorithm).

s,d,c,z?dbtrf

Computes an LU factorization of a general tridiagonal matrixwith no pivoting (local blocked algorithm).

s,d,c,z?dttrf

Solves a general tridiagonal system of linear equations usingthe LU factorization computed by ?dttrf.

s,d,c,z?dttrsv

Solves a symmetric (Hermitian) positive-definite tridiagonalsystem of linear equations, using the LDLH factorizationcomputed by ?pttrf.

s,d,c,z?pttrsv

Computes all eigenvalues and, optionally, eigenvectors ofa symmetric tridiagonal matrix using the implicit QL or QRmethod.

s,d?steqr2

1901


p?lacgvConjugates a complex vector.

Syntax

call pclacgv(n, x, ix, jx, descx, incx)

call pzlacgv(n, x, ix, jx, descx, incx)

Description

The routine conjugates a complex vector of length n, sub(x), where sub(x) denotes X(ix,jx:jx+n-1) if incx = descx( m_ ) and X(ix:ix+n-1, jx) if incx = 1.

Input Parameters

(global) INTEGER. The length of the distributed vectorsub(x).

n

(local).xCOMPLEX for pclacgvCOMPLEX*16 for pzlacgv.Pointer into the local memory toan array of DIMENSION (lld_x,*). On entry the vector tobe conjugated x(i) = X(ix+(jx-1)*m_x+(i-1)*incx ),

1 ≤ i ≤ n.

(global) INTEGER.The row index in the global array xindicating the first row of sub(x).

ix

(global) INTEGER. The column index in the global array xindicating the first column of sub(x).

jx

(global and local) INTEGER. Array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix X.

descx

(global) INTEGER.The global increment for the elements ofX. Only two values of incx are supported in this version,namely 1 and m_x. incx must not be zero.

incx

Output Parameters

(local).xOn exit, the conjugated vector.

1902


p?max1Finds the index of the element whose real part hasmaximum absolute value (similar to the Level 1PBLAS p?amax, but using the absolute value to thereal part).

Syntax

call pcmax1(n, amax, indx, x, ix, jx, descx, incx)

call pzmax1(n, amax, indx, x, ix, jx, descx, incx)

Description

This routine computes the global index of the maximum element in absolute value of a distributedvector sub(x). The global index is returned in indx and the value is returned in amax, wheresub(x) denotes X(ix:ix+n-1, jx) if incx = 1, X(ix, jx:jx+n-1) if incx = m_x.

Input Parameters

(global) pointer to INTEGER. The number of components of

the distributed vector sub(x). n ≥ 0.

n

(local)xCOMPLEX for pcmax1.COMPLEX*16 for pzmax1Array containing the local pieces of a distributed matrix ofdimension of at least ((jx-1)*m_x + ix + ( n - 1)*abs( incx)). This array contains the entries of thedistributed vector sub (x).

(global) INTEGER.The row index in the global array Xindicating the first row of sub(x).

ix

(global) INTEGER. The column index in the global array Xindicating the first column of sub(x)

jx

(global and local) INTEGER. Array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix X.

descx

(global) INTEGER.The global increment for the elements ofX. Only two values of incx are supported in this version,namely 1 and m_x. incx must not be zero.

incx

1903


Output Parameters

(global output) pointer to REAL.The absolute value of thelargest entry of the distributed vector sub(x) only in thescope of sub(x).

amax

(global output) pointer to INTEGER.The global index of theelement of the distributed vector sub(x) whose real parthas maximum absolute value.

indx

?combamax1Finds the element with maximum real part absolutevalue and its corresponding global index.

Syntax

call ccombamax1(v1, v2)

call zcombamax1(v1, v2)

Description

This routine finds the element having maximum real part absolute value as well as itscorresponding global index.

Input Parameters

(local)v1COMPLEX for ccombamax1COMPLEX*16 for zcombamax1 Array, DIMENSION 2. The firstmaximum absolute value element and its global index.v1(1) = amax, v1(2) = indx.

(local)v2COMPLEX for ccombamax1COMPLEX*16 for zcombamax1Array, DIMENSION 2. The second maximum absolute valueelement and its global index. v2(1) = amax, v2(2) =indx.

1904


Output Parameters

(local).v1The first maximum absolute value element and its globalindex. v1(1) = amax, v1(2) = indx.

p?sum1Forms the 1-norm of a complex vector similar toLevel 1 PBLAS p?asum, but using the true absolutevalue.

Syntax

call pscsum1(n, asum, x, ix, jx, descx, incx)

call pdzsum1(n, asum, x, ix, jx, descx, incx)

Description

This routine returns the sum of absolute values of a complex distributed vector sub(x) in asum,where sub(x) denotes X(ix:ix+n-1, jx:jx), if incx = 1, X(ix:ix, jx:jx+n-1), if incx= m_x.

Based on p?asum from the Level 1 PBLAS. The change is to use the 'genuine' absolute value.

Input Parameters

(global) pointer to INTEGER. The number of components of

the distributed vector sub(x). n ≥ 0.

n

(local ) COMPLEX for pscsum1xCOMPLEX*16 for pdzsum1.Array containing the local pieces of a distributed matrix ofdimension of at least ((jx-1)*m_x + ix + ( n - 1)*abs( incx)). This array contains the entries of thedistributed vector sub (x).

(global) INTEGER.The row index in the global array Xindicating the first row of sub(x).

ix

(global) INTEGER. The column index in the global array Xindicating the first column of sub(x)

jx

1905


(global and local) INTEGER. Array, DIMENSION 8. The arraydescriptor for the distributed matrix X.

descx

(global) INTEGER.The global increment for the elements ofX. Only two values of incx are supported in this version,namely 1 and m_x.

incx

Output Parameters

(local)asumPointer to REAL. The sum of absolute values of thedistributed vector sub(x) only in its scope.

p?dbtrsvComputes an LU factorization of a generaltriangular matrix with no pivoting. The routine iscalled by p?dbtrs.

Syntax

call psdbtrsv(uplo, trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pddbtrsv(uplo, trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pcdbtrsv(uplo, trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pzdbtrsv(uplo, trans, n, bwl, bwu, nrhs, a, ja, desca, b, ib, descb, af,laf, work, lwork, info)

Description

This routines solves a banded triangular system of linear equations

A(1 :n, ja:ja+n-1) * X = B(ib:ib+n-1, 1 :nrhs) or

A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) (for real flavors); A(1 :n, ja:ja+n-1)H * X

= B(ib:ib+n-1, 1 :nrhs) (for complex flavors),

1906


where A(1 :n, ja:ja+n-1) is a banded triangular matrix factor produced by the Gaussianelimination code PD@(dom_pre)BTRF and is stored in A(1 :n, ja:ja+n-1) and af. The matrixstored in A(1 :n, ja:ja+n-1) is either upper or lower triangular according to uplo, and thechoice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja+n-1)T is dictated by the user by theparameter trans.

Routine p?dbtrf must be called first.

Input Parameters

(global) CHARACTER.uploIf uplo='U', the upper triangle of A(1:n, ja:ja+n-1) isstored,if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) isstored.

(global) CHARACTER.transIf trans = 'N', solve with A(1:n, ja:ja+n-1),if trans = 'C', solve with conjugate transpose A(1:n,ja:ja+n-1).


A;(n≥ 0).

n

(global) INTEGER. Number of subdiagonals. 0 ≤ bwl ≤ n-1.bwl

(global) INTEGER. Number of subdiagonals. 0 ≤ bwu ≤ n-1.bwu


number of columns of the distributed submatrix B (nrhs≥0).

nrhs

(local).aREAL for psdbtrsvDOUBLE PRECISION for pddbtrsvCOMPLEX for pcdbtrsvCOMPLEX*16 for pzdbtrsv.Pointer into the local memory to an array with first

DIMENSION lld_a≥(bwl+bwu+1)(stored in desca). On entry,this array contains the local pieces of the n-by-nunsymmetric banded distributed Cholesky factor L or LT*A(1:n, ja:ja+n-1). This local portion is stored in the packed

1907


banded format used in LAPACK. Please see the ApplicationNotes below and the ScaLAPACK manual for more detail onthe format of distributed matrices.


ja

(global and local) INTEGER array of DIMENSION (dlen_).desca

if 1d type (dtype_a = 501 or 502), dlen≥ 7;

if 2d type (dtype_a = 1), dlen≥ 9. The array descriptor forthe distributed matrix A. Contains information of mappingof A to memory.

(local)bREAL for psdbtrsvDOUBLE PRECISION for pddbtrsvCOMPLEX for pcdbtrsvCOMPLEX*16 for pzdbtrsv.Pointer into the local memory to an array of local lead

DIMENSION lld_b≥nb. On entry, this array contains thelocal pieces of the right-hand sides B(ib:ib+n-1, 1:nrhs).


ib

(global and local) INTEGER array of DIMENSION (dlen_).descb

if 1d type (dtype_b =502), dlen≥7;

if 2d type (dtype_b =1), dlen≥9. The array descriptor forthe distributed matrix B. Contains information of mappingB to memory.

(local)lafINTEGER. Size of user-input Auxiliary Filling space af.

laf must be ≥nb*(bwl+bwu)+6*max(bwl, bwu)*max(bwl,bwu). If laf is not large enough, an error code is returnedand the minimum acceptable size will be returned in af(1).

(local).workREAL for psdbtrsvDOUBLE PRECISION for pddbtrsv

1908


COMPLEX for pcdbtrsvCOMPLEX*16 for pzdbtrsv.Temporary workspace. This space may be overwritten inbetween calls to routines.work must be the size given in lwork.

(local or global) INTEGER.lworkSize of user-input workspace work. If lwork is too small,the minimal acceptable size will be returned in work(1) andan error code is returned.

lwork≥ max(bwl, bwu)*nrhs.

Output Parameters

(local).aThis local portion is stored in the packed banded formatused in LAPACK. Please see the ScaLAPACK manual for moredetail on the format of distributed matrices.


b

(local).afREAL for psdbtrsvDOUBLE PRECISION for pddbtrsvCOMPLEX for pcdbtrsvCOMPLEX*16 for pzdbtrsv.Auxiliary Filling Space. Filling is created during thefactorization routine p?dbtrf and this is stored in af. If alinear system is to be solved using p?dbtrf after thefactorization routine, af must not be altered after thefactorization.

On exit, work( 1 ) contains the minimal lwork.work

(local).infoINTEGER. If info = 0, the execution is successful.< 0: If the i-th argument is an array and the j-entry hadan illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

1909

p?dttrsvComputes an LU factorization of a general bandmatrix, using partial pivoting with rowinterchanges. The routine is called by p?dttrs.

Syntax

call psdttrsv(uplo, trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pddttrsv(uplo, trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pcdttrsv(uplo, trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af,laf, work, lwork, info)

call pzdttrsv(uplo, trans, n, nrhs, dl, d, du, ja, desca, b, ib, descb, af,laf, work, lwork, info)

Description

This routine solves a tridiagonal triangular system of linear equations

A(1 :n, ja:ja+n-1) * X = B(ib:ib+n-1, 1 :nrhs) or

A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) for real flavors; A(1 :n,ja:ja+n-1)H * X = B(ib:ib+n-1, 1 :nrhs) for complex flavors,

where A(1 :n, ja:ja+n-1) is a tridiagonal matrix factor produced by the Gaussian eliminationcode PS@(dom_pre)TTRF and is stored in A(1 :n, ja:ja+n-1) and af.

The matrix stored in A(1 :n, ja:ja+n-1) is either upper or lower triangular according touplo, and the choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja+n-1)T is dictatedby the user by the parameter trans.

Routine p?dttrf must be called first.

Input Parameters

(global) CHARACTER.uploIf uplo='U', the upper triangle of A(1:n, ja:ja+n-1) isstored,if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) isstored.

1910


(global) CHARACTER.transIf trans = 'N', solve with A(1:n, ja:ja+n-1),if trans = 'C', solve with conjugate transpose A(1:n,ja:ja+n-1).


A;(n≥ 0).

n

(global) INTEGER. The number of right-hand sides; thenumber of columns of the distributed submatrix

B(ib:ib+n-1, 1:nrhs). (nrhs≥ 0).

nrhs

(local).dlREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsvCOMPLEX*16 for pzdttrsv.Pointer to local part of global vector storing the lowerdiagonal of the matrix.Globally, dl(1) is not referenced, and dl must be alignedwith d.

Must be of size ≥desca( nb_ ).

(local).dREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsvCOMPLEX*16 for pzdttrsv.Pointer to local part of global vector storing the maindiagonal of the matrix.

(local).duREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsvCOMPLEX*16 for pzdttrsv.Pointer to local part of global vector storing the upperdiagonal of the matrix.Globally, du(n) is not referenced, and du must be alignedwith d.

1911



ja

(global and local). INTEGER array of DIMENSION (dlen_).desca

if 1d type (dtype_a = 501 or 502), dlen≥ 7;

if 2d type (dtype_a = 1), dlen≥ 9.The array descriptor for the distributed matrix A. Containsinformation of mapping of A to memory.

(local)bREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsvCOMPLEX*16 for pzdttrsv.Pointer into the local memory to an array of local lead

DIMENSION lld_b≥nb. On entry, this array contains thelocal pieces of the right-hand sides B(ib:ib+n-1, 1:nrhs).

(global). INTEGER. The row index in the global array b thatpoints to the first row of the matrix to be operated on (whichmay be either all of b or a submatrix of B).

ib

(global and local).INTEGER array of DIMENSION (dlen_).descb

if 1d type (dtype_b = 502), dlen≥7;

if 2d type (dtype_b = 1), dlen≥ 9.The array descriptor for the distributed matrix B. Containsinformation of mapping B to memory.

(local).lafINTEGER.Size of user-input Auxiliary Filling space af.

laf must be ≥ 2*(nb+2). If laf is not large enough, anerror code is returned and the minimum acceptable size willbe returned in af(1).

(local).workREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsv

1912


COMPLEX*16 for pzdttrsv.Temporary workspace. This space may be overwritten inbetween calls to routines.work must be the size given in lwork.

(local or global).INTEGER.lworkSize of user-input workspace work. If lwork is too small,the minimal acceptable size will be returned in work(1) andan error code is returned.

lwork≥ 10*npcol+4*nrhs.

Output Parameters

(local).dlOn exit, this array contains information containing thefactors of the matrix.

On exit, this array contains information containing the

factors of the matrix. Must be of size ≥desca (nb_ ).

d


b

(local).afREAL for psdttrsvDOUBLE PRECISION for pddttrsvCOMPLEX for pcdttrsvCOMPLEX*16 for pzdttrsv.Auxiliary Filling Space. Filling is created during thefactorization routine p?dttrf and this is stored in af. If alinear system is to be solved using p?dttrs after thefactorization routine, af must not be altered after thefactorization.


(local). INTEGER.infoIf info=0, the execution is successful.if info< 0: If the i-th argument is an array and the j-entryhad an illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

1913

p?gebd2Reduces a general rectangular matrix to realbidiagonal form by an orthogonal/unitarytransformation (unblocked algorithm).

Syntax

call psgebd2(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pdgebd2(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pcgebd2(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

call pzgebd2(m, n, a, ia, ja, desca, d, e, tauq, taup, work, lwork, info)

Description

This routine reduces a real/complex general m-by-n distributed matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:

Q' * sub(A) * P = B.

If m ≥ n, B is the upper bidiagonal; if m < n, B is the lower bidiagonal.

Input Parameters

(global) INTEGER.mThe number of rows of the distributed submatrix sub(A). (m

≥ 0).

(global) INTEGER.n

The order of the distributed submatrix sub(A). (n ≥ 0).

(local).aREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the generaldistributed matrix sub(A).

1914


ia, ja

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix A.

desca

(local).workREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.This is a workspace array of DIMENSION (lwork).

(local or global) INTEGER.lworkThe dimension of the array work.

lwork is local input and must be at least lwork ≥ max(mpa0, nqa0 ), where nb = mb_a = nb_a, iroffa =mod( ia-1, nb ), iarow = indxg2p ( ia, nb,myrow, rsrc_a, nprow ), iacol = indxg2p ( ja,nb, mycol, csrc_a, npcol ), mpa0 = numroc(m+iroffa, nb, myrow, iarow, nprow ), nqa0 =numroc( n+icoffa, nb, mycol, iacol, npcol ).indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

(local).a

On exit, if m ≥ n, the diagonal and the first superdiagonalof sub(A) are overwritten with the upper bidiagonal matrixB; the elements below the diagonal, with the array tauq,represent the orthogonal/unitary matrix Q as a product ofelementary reflectors, and the elements above the first

1915


superdiagonal, with the array taup, represent the orthogonalmatrix P as a product of elementary reflectors. If m < n,the diagonal and the first subdiagonal are overwritten withthe lower bidiagonal matrix B; the elements below the firstsubdiagonal, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors, and the elements above the diagonal, with thearray taup, represent the orthogonal matrix P as a productof elementary reflectors. See Applications Notes below.

(local)dREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.

Array, DIMENSION LOCc(ja+min(m,n)-1) if m ≥ n;LOCr(ia+min(m,n)-1) otherwise. The distributed diagonalelements of the bidiagonal matrix B: d(i) = a(i,i). d istied to the distributed matrix A.

(local)eREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.

Array, DIMENSION LOCc(ja+min(m,n)-1) if m ≥ n;LOCr(ia+min(m,n)-2) otherwise. The distributed diagonalelements of the bidiagonal matrix B:

if m ≥ n, e(i) = a(i, i+1) for i = 1, 2, ... , n-1;if m < n, e(i) = a(i+1, i) for i = 1, 2, ..., m-1.e is tied to the distributed matrix A.

(local).tauqREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.

1916

Array, DIMENSIONLOCc(ja+min(m,n)-1). The scalar factorsof the elementary reflectors which represent theorthogonal/unitary matrix Q. tauq is tied to the distributedmatrix A.

(local).taupREAL for psgebd2DOUBLE PRECISION for pdgebd2COMPLEX for pcgebd2COMPLEX*16 for pzgebd2.Array, DIMENSION LOCr(ia+min(m,n)-1). The scalarfactors of the elementary reflectors which represent theorthogonal/unitary matrix P. taup is tied to the distributedmatrix A.

On exit, work(1) returns the minimal and optimal lwork.work

(local)infoINTEGER.If info = 0, the execution is successful.if info < 0: If the i-th argument is an array and thej-entry had an illegal value, then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

Application Notes


If m≥n,

Q = H(1) H(2) . . . H(n) and P = G(1) G(2). . . G(n-1)


H(i) = I - tauq *v *v' and G(i) = I - taup *u*u',

where tauq and taup are real/complex scalars, and v and u are real/complex vectors. v(1:i-1) = 0, v(i) = 1, and v(i+i:m) is stored on exit in

A(ia+i-ia+m-1, a+i-1);

u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A(ia+i-1, ja+i+1:ja+n-1);

tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1).

1917

If m < n,

v(1: i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A(ia+i+1: ia+m-1, ja+i-1);

u(1: i-1) = 0, u(i) = 1, and u(i+1 :n) is stored on exit in A(ia+i-1,ja+i:ja+n-1);

tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1).

The contents of sub(A) on exit are illustrated by the following examples:

where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of thevector defining H(i), and ui an element of the vector defining G(i).

p?gehd2Reduces a general matrix to upper Hessenbergform by an orthogonal/unitary similaritytransformation (unblocked algorithm).

Syntax

call psgehd2(n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pdgehd2(n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pcgehd2(n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

call pzgehd2(n, ilo, ihi, a, ia, ja, desca, tau, work, lwork, info)

1918

Description

This routine reduces a real/complex general distributed matrix sub(A) to upper Hessenbergform H by an orthogonal/unitary similarity transformation: Q' * sub(A) * Q = H, wheresub(A) = A(ia+n-1 :ia+n-1, ja+n-1 :ja+n-1).

Input Parameters

(global) INTEGER. The order of the distributed submatrix A.

(n≥ 0).

n

(global) INTEGER. It is assumed that sub(A) is already uppertriangular in rows ia:ia+ilo-2 and ia+ihi:ia+n-1 andcolumns ja:ja+jlo-2 and ja+jhi:ja+n-1. See ApplicationNotes for further information.

ilo, ihi

If n≥ 0, 1 ≤ ilo ≤ ihi ≤ n; otherwise set ilo = 1, ihi= n.

(local).aREAL for psgehd2DOUBLE PRECISION for pdgehd2COMPLEX for pcgehd2COMPLEX*16 for pzgehd2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the n-by-ngeneral distributed matrix sub(A) to be reduced.

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column of thesubmatrix A, respectively.

ia, ja


desca

(local).workREAL for psgehd2DOUBLE PRECISION for pdgehd2COMPLEX for pcgehd2COMPLEX*16 for pzgehd2.This is a workspace array of DIMENSION (lwork).

(local or global). INTEGER.lwork

1919


The dimension of the array work.

lwork is local input and must be at least lwork≥nb + max(npa0, nb ), where nb = mb_a = nb_a, iroffa = mod(ia-1, nb ), iarow = indxg2p ( ia, nb, myrow,rsrc_a, nprow ),npa0 = numroc(ihi+iroffa, nb,myrow, iarow, nprow ).indxg2p and numroc are ScaLAPACK tool functions;myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

(local). On exit, the upper triangle and the first subdiagonalof sub(A) are overwritten with the upper Hessenberg matrixH, and the elements below the first subdiagonal, with the

a

array tau, represent the orthogonal/unitary matrix Q as aproduct of elementary reflectors. See Application Notesbelow.

(local).tauREAL for psgehd2DOUBLE PRECISION for pdgehd2COMPLEX for pcgehd2COMPLEX*16 for pzgehd2.Array, DIMENSION LOCc(ja+n-2) The scalar factors of theelementary reflectors (see Application Notes below).Elements ja:ja+ilo-2 and ja+ihi:ja+n-2 of tau are setto zero. tau is tied to the distributed matrix A.


(local).INTEGER.infoIf info = 0, the execution is successful.

1920


if info < 0: If the i-th argument is an array and the j-entryhad an illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

Application Notes

The matrix Q is represented as a product of (ihi-ilo) elementary reflectors

Q = H(ilo) H(ilo+1) . . . H(ihi-1).


H(i) = I - tau *v *v',

where tau is a real/complex scalar, and v is a real/complex vector with v(1: i) = 0, v(i+1)= 1 and v(ihi+1: n) = 0; v(i+2: ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1,ia+ilo+i-2), and tau in tau(ja+ilo+i-2).

The contents of A(ia:ia+n-1, ja:ja+n-1) are illustrated by the following example, with n =7, ilo = 2 and ihi = 6:

where a denotes an element of the original matrix sub(A), h denotes a modified element of theupper Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).

1921

p?gelq2Computes an LQ factorization of a generalrectangular matrix (unblocked algorithm).

Syntax

call psgelq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgelq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgelq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgelq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

Description

This routine computes an LQ factorization of a real/complex distributed m-by-n matrix sub(A)= A(ia:ia+m-1, ja:ja+n-1) = L *Q.

Input Parameters

(global) INTEGER.mThe number of rows to be operated on, that is, the number

of rows of the distributed submatrix sub(A). (m≥0).

(global) INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A).

(n≥0).

(local).aREAL for psgelq2DOUBLE PRECISION for pdgelq2COMPLEX for pcgelq2COMPLEX*16 for pzgelq2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A) which is to be factored.


ia, ja

1922



desca

(local).workREAL for psgelq2DOUBLE PRECISION for pdgelq2COMPLEX for pcgelq2COMPLEX*16 for pzgelq2.This is a workspace array of DIMENSION (lwork).


lwork is local input and must be at least lwork≥nq0 + max(1, mp0 ), where iroff = mod(ia-1, mb_a), icoff =mod(ja-1, nb_a), iarow = indxg2p(ia, mb_a, myrow,rsrc_a, nprow), iacol = indxg2p(ja, nb_a, mycol, csrc_a,npcol), mp0 = numroc(m+iroff, mb_a, myrow, iarow,nprow), nq0 = numroc(n+icoff, nb_a, mycol, iacol,npcol),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

(local).aOn exit, the elements on and below the diagonal of sub(A)contain the m by min(m,n) lower trapezoidal matrix L (L is

lower triangular if m ≤ n); the elements above the diagonal,with the array tau, represent the orthogonal/unitary matrixQ as a product of elementary reflectors (see ApplicationNotes below).

(local).tauREAL for psgelq2

1923


DOUBLE PRECISION for pdgelq2COMPLEX for pcgelq2COMPLEX*16 for pzgelq2.Array, DIMENSION LOCr(ia+min(m, n)-1). This arraycontains the scalar factors of the elementary reflectors. tauis tied to the distributed matrix A.


(local).INTEGER. If info = 0, the execution is successful.if info < 0: If the i-th argument is an array and the j-entryhad an illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

info

Application Notes


Q = H(ia+k-1) H(ia+k-2). . . H(ia) for real flavors,Q = H(ia+k-1)' H(ia+k-2)'. . . H(ia)' forcomplex flavors,

where k = min(m,n).


H(i) = I - tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 andv(i) = 1; v(i+1: n) (for real flavors) or conjg(v(i+1: n)) (for complex flavors) is storedon exit in A(ia+i-1,ja+i:ja+n-1), and tau in TAU(ia+i-1).

p?geql2Computes a QL factorization of a generalrectangular matrix (unblocked algorithm).

Syntax

call psgeql2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgeql2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgeql2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgeql2(m, n, a, ia, ja, desca, tau, work, lwork, info)

1924

Description

The routine computes a QL factorization of a real/complex distributed m-by-n matrix sub(A) =A(ia:ia+m-1, ja:ja+n-1)= Q *L.

Input Parameters


of rows of the distributed submatrix sub(A). (m≥ 0).

(global) INTEGER.nThe number of columns to be operated on, that is, the

number of columns of the distributed submatrix sub(A). (n≥0).

(local).aREAL for psgeql2DOUBLE PRECISION for pdgeql2COMPLEX for pcgeql2COMPLEX*16 for pzgeql2.Pointer into the local memory to an array of DIMENSION(lld_a,LOCc (ja+n-1)).On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A) which is to be factored.


ia, ja


desca

(local).workREAL for psgeql2DOUBLE PRECISION for pdgeql2COMPLEX for pcgeql2COMPLEX*16 for pzgeql2.This is a workspace array of DIMENSION (lwork).


1925


lwork is local input and must be at least lwork≥mp0 +max(1, nq0), where iroff = mod( ia-1, mb_a ), icoff =mod( ja-1, nb_a ), iarow = indxg2p( ia, mb_a, myrow,rsrc_a, nprow ), iacol = indxg2p(ja, nb_a, mycol, csrc_a,npcol ), mp0 = numroc( m+iroff, mb_a, myrow, iarow,nprow ), nq0 = numroc( n+icoff, nb_a, mycol, iacol,npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

(local).aOn exit,

if m ≥ n, the lower triangle of the distributed submatrixA(ia+m-n:ia+m-1, ja:ja+n-1) contains the n-by-n lower

triangular matrix L; if m ≤ n, the elements on and belowthe (n-m)-th superdiagonal contain the m-by-n lowertrapezoidal matrix L; the remaining elements, with the arraytau, represent the orthogonal/ unitary matrix Q as a productof elementary reflectors (see Application Notes below).

(local).tauREAL for psgeql2DOUBLE PRECISION for pdgeql2COMPLEX for pcgeql2COMPLEX*16 for pzgeql2.Array, DIMENSION LOCc(ja+n-1). This array contains thescalar factors of the elementary reflectors. tau is tied to thedistributed matrix A.


(local). INTEGER.info

1926


If info = 0, the execution is successful. if info < 0: If thei-th argument is an array and the j-entry had an illegalvalue, then info = - (i*100+j), if the i-th argument is ascalar and had an illegal value, then info = -i.

Application Notes


Q = H(ja+k-1) . . . H(ja+1) H(ja), where k = min(m,n).


H(i) = I- tau * v * v'

where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1: m) = 0and v(m-k+i) = 1; v(1: m-k+i-1) is stored on exit in A(ia:ia+m-k+i-2, ja+n-k+i-1), andtau in TAU(ja+n-k+i-1).

p?geqr2Computes a QR factorization of a generalrectangular matrix (unblocked algorithm).

Syntax

call psgeqr2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgeqr2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgeqr2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgeqr2(m, n, a, ia, ja, desca, tau, work, lwork, info)

Description

This routine computes a QR factorization of a real/complex distributed m-by-n matrix sub(A) =A(ia:ia+m-1, ja:ja+n-1)= Q*R.

Input Parameters

(global). INTEGER.mThe number of rows to be operated on, that is, the number


1927

(global).INTEGER. The number of columns to be operatedon, that is, the number of columns of the distributed

submatrix sub(A). (n≥0).

n

(local).aREAL for psgeqr2DOUBLE PRECISION for pdgeqr2COMPLEX for pcgeqr2COMPLEX*16 for pzgeqr2.Pointer into the local memory to an array of DIMENSION(lld_a, LOCc (ja+n-1)).On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A) which is to be factored.


ia, ja


desca

(local).workREAL for psgeqr2DOUBLE PRECISION for pdgeqr2COMPLEX for pcgeqr2COMPLEX*16 for pzgeqr2.This is a workspace array of DIMENSION (lwork).

(local or global). INTEGER.lworkThe dimension of the array work.

lwork is local input and must be at least lwork≥mp0 +max(1, nq0), where iroff = mod( ia-1, mb_a ), icoff= mod( ja-1, nb_a ), iarow = indxg2p( ia, mb_a,myrow, rsrc_a, nprow ), iacol = indxg2p( ja,nb_a, mycol, csrc_a, npcol ), mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow ), nq0 =numroc( n+icoff, nb_a, mycol, iacol, npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

1928


Output Parameters

(local).aOn exit, the elements on and above the diagonal of sub(A)contain the min(m,n) by n upper trapezoidal matrix R (R is

upper triangular if m≥n); the elements below the diagonal,with the array tau, represent the orthogonal/unitary matrixQ as a product of elementary reflectors (see ApplicationNotes below).

(local).tauREAL for psgeqr2DOUBLE PRECISION for pdgeqr2COMPLEX for pcgeqr2COMPLEX*16 for pzgeqr2.Array, DIMENSION LOCc(ja+min(m,n)-1). This arraycontains the scalar factors of the elementary reflectors. tauis tied to the distributed matrix A.


(local). INTEGER.infoIf info = 0, the execution is successful. if info < 0:If the i-th argument is an array and the j-entry had anillegal value, then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

Application Notes


Q = H(ja) H(ja+1) . . . H(ja+k-1), where k = min(m,n).


H(j)= I - tau * v * v',

1929

where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i)= 1; v(i+1: m) is stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in TAU(ja+i-1).

p?gerq2Computes an RQ factorization of a generalrectangular matrix (unblocked algorithm).

Syntax

call psgerq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pdgerq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pcgerq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

call pzgerq2(m, n, a, ia, ja, desca, tau, work, lwork, info)

Description

This routine computes an RQ factorization of a real/complex distributed m-by-n matrix sub(A)= A(ia:ia+m-1, ja:ja+n-1) = R*Q.

Input Parameters




submatrix sub(A). (n≥0).

n

(local).aREAL for psgerq2DOUBLE PRECISION for pdgerq2COMPLEX for pcgerq2COMPLEX*16 for pzgerq2.Pointer into the local memory to an array of DIMENSION(lld_a,LOCc(ja+n-1)).On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A) which is to be factored.

1930



ia, ja


desca

(local).workREAL for psgerq2DOUBLE PRECISION for pdgerq2COMPLEX for pcgerq2COMPLEX*16 for pzgerq2.This is a workspace array of DIMENSION (lwork).

(local or global). INTEGER.lworkThe dimension of the array work.

lwork is local input and must be at least lwork≥nq0 +max( 1, mp0 ), whereiroff = mod(ia-1, mb_a ), icoff = mod( ja-1,nb_a ),iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol), mp0 = numroc( m+iroff, mb_a, myrow, iarow,nprow ),nq0 = numroc( n+icoff, nb_a, mycol, iacol, npcol),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

(local).aOn exit,

1931


if m ≤ n, the upper triangle of A(ia+m-n:ia+m-1, ja:ja+n-1)

contains the m-by-m upper triangular matrix R; if m ≥ n, theelements on and above the (m-n)-th subdiagonal containthe m-by-n upper trapezoidal matrix R; the remainingelements, with the array tau, represent the orthogonal/unitary matrix Q as a product of elementary reflectors (seeApplication Notes below).

(local).tauREAL for psgerq2DOUBLE PRECISION for pdgerq2COMPLEX for pcgerq2COMPLEX*16 for pzgerq2.Array, DIMENSION LOCr(ia+m -1). This array contains thescalar factors of the elementary reflectors. tau is tied to thedistributed matrix A.


(local). INTEGER.infoIf info = 0, the execution is successful.if info < 0: If the i-th argument is an array and the j-entryhad an illegal value, then info = - (i*100+j), if the i-thargument is a scalar and had an illegal value, then info =-i.

Application Notes


Q = H(ia) H(ia+1) . . . H(ia+k-1) for real flavors,

Q = H(ia)' H(ia+1)' . . . H(ia+k-1)' for complex flavors,

where k = min(m, n).


H(i) = I - tau *v *v',

where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0and v(n-k+i) = 1; v(1:n-k+i-1) for real flavors or conjg(v(1:n-k+i-1)) for complexflavors is stored on exit in A(ia+m-k+i-1, ja:ja+n-k+i-2), and tau in TAU(ia+m-k+i-1).

1932

p?getf2Computes an LU factorization of a general matrix,using partial pivoting with row interchanges (localblocked algorithm).

Syntax

call psgetf2(m, n, a, ia, ja, desca, ipiv, info)

call pdgetf2(m, n, a, ia, ja, desca, ipiv, info)

call pcgetf2(m, n, a, ia, ja, desca, ipiv, info)

call pzgetf2(m, n, a, ia, ja, desca, ipiv, info)

Description

This routine computes an LU factorization of a general m-by-n distributed matrix sub(A) =A(ia:ia+m-1, ja:ja+n-1) using partial pivoting with row interchanges.

The factorization has the form sub(A) = P * L * U, where P is a permutation matrix, L islower triangular with unit diagonal elements (lower trapezoidal if m>n), and U is upper triangular(upper trapezoidal if m < n). This is the right-looking Parallel Level 2 BLAS version of thealgorithm.

Input Parameters




submatrix sub(A). (nb_a - mod(ja-1, nb_a)≥n≥0).

n

(local).aREAL for psgetf2DOUBLE PRECISION for pdgetf2COMPLEX for pcgetf2COMPLEX*16 for pzgetf2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).

1933

On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A).

(global) INTEGER. The row and column indices in the globalarray a indicating the first row and the first column of thesubmatrix sub(A), respectively.

ia, ja


desca

Output Parameters

(local). INTEGER.ipivArray, DIMENSION(LOCr(m_a) + mb_a). This array containsthe pivoting information. ipiv(i) -> The global row thatlocal row i was swapped with. This array is tied to thedistributed matrix A.

(local). INTEGER.infoIf info = 0: successful exit.If info < 0:

• if the i-th argument is an array and the j-entry had anillegal value, then info = -(i*100+j),

• if the i-th argument is a scalar and had an illegal value,then info = - i.

If info > 0: If info = k, u(ia+k-1, ja+k-1 ) is exactlyzero. The factorization has been completed, but the factoru is exactly singular, and division by zero will occur if it isused to solve a system of equations.

1934

p?labrdReduces the first nb rows and columns of a generalrectangular matrix A to real bidiagonal form by anorthogonal/unitary transformation, and returnsauxiliary matrices that are needed to apply thetransformation to the unreduced part of A.

Syntax

call pslabrd(m, n, nb, a, ia, ja, desca, d, e, tauq, taup, x, ix, jx, descx,y, iy, jy, descy, work)

call pdlabrd(m, n, nb, a, ia, ja, desca, d, e, tauq, taup, x, ix, jx, descx,y, iy, jy, descy, work)

call pclabrd(m, n, nb, a, ia, ja, desca, d, e, tauq, taup, x, ix, jx, descx,y, iy, jy, descy, work)

call pzlabrd(m, n, nb, a, ia, ja, desca, d, e, tauq, taup, x, ix, jx, descx,y, iy, jy, descy, work)

Description

This routine reduces the first nb rows and columns of a real/complex general m-by-n distributedmatrix sub(A) = A(ia:ia+m-1, ja:ja+n-1) to upper or lower bidiagonal form by anorthogonal/unitary transformation Q'* A * P, and returns the matrices X and Y necessary toapply the transformation to the unreduced part of sub(A).

If m ≥n, sub(A) is reduced to upper bidiagonal form; if m < n, sub(A) is reduced to lowerbidiagonal form.

This is an auxiliary routine called by p?gebrd.

Input Parameters


of rows of the distributed submatrix sub(A). (m ≥ 0).


submatrix sub(A). (n ≥ 0).

n

1935

(global) INTEGER.nbThe number of leading rows and columns of sub(A) to bereduced.

(local).aREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrd.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the generaldistributed matrix sub(A).


ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray x indicating the first row and the first column of thesubmatrix sub(X), respectively.

ix, jx

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix X.

descx

(global) INTEGER. The row and column indices in the globalarray y indicating the first row and the first column of thesubmatrix sub(Y), respectively.

iy, jy

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix Y.

descy

(local).workREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrdWorkspace array, DIMENSION(lwork)

lwork ≥ nb_a + nq,with nq = numroc( n + mod(ia-1, nb_y), nb_y,mycol, iacol, npcol )

1936


iacol = indxg2p (ja, nb_a, mycol, csrc_a, npcol)indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

Output Parameters

(local)aOn exit, the first nb rows and columns of the matrix areoverwritten; the rest of the distributed matrix sub(A) is

unchanged. if m ≥ n, elements on and below the diagonalin the first nb columns, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors;and elements above the diagonal in the first nbrows, with the array taup, represent the orthogonal/unitarymatrix P as a product of elementary reflectors.If m < n, elements below the diagonal in the first nbcolumns, with the array tauq, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors, and elements on and above the diagonal in thefirst nb rows, with the array taup, represent theorthogonal/unitary matrix P as a product of elementaryreflectors. See Application Notes below.

(local).dREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrd

Array, DIMENSION LOCr(ia+min(m,n)-1) if m ≥ n;LOCc(ja+min(m,n)-1) otherwise. The distributed diagonalelements of the bidiagonal distributed matrix B:d(i) = A(ia+i-1, ja+i-1).d is tied to the distributed matrix A.

(local).eREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrd

1937

COMPLEX*16 for pzlabrd

Array, DIMENSION LOCr(ia+min(m,n)-1) if m ≥ n;LOCc(ja+min(m,n)-2) otherwise. The distributedoff-diagonal elements of the bidiagonal distributed matrixB:

if m ≥ n, E(i) = A(ia+i-1, ja+i) for i = 1, 2, ...,n-1;if m < n,E(i) = A(ia+i, ja+i-1) for i = 1, 2, ...,m-1.e is tied to the distributed matrix A.

(local).tauq, taupREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrdArray DIMENSION LOCc(ja+min(m, n)-1) for tauq,DIMENSION LOCr(ia+min(m, n)-1) for taup. The scalarfactors of the elementary reflectors which represent theorthogonal/unitary matrix Q for tauq, P for taup. tauq andtaup are tied to the distributed matrix A. See ApplicationNotes below.

(local)xREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrdPointer into the local memory to an array of DIMENSION(lld_x, nb). On exit, the local pieces of the distributedm-by-nb matrix X(ix:ix+m-1, jx:jx+nb-1) required toupdate the unreduced part of sub(A).

(local).yREAL for pslabrdDOUBLE PRECISION for pdlabrdCOMPLEX for pclabrdCOMPLEX*16 for pzlabrd

1938

Pointer into the local memory to an array of DIMENSION(lld_y, nb). On exit, the local pieces of the distributedn-by-nb matrix Y(iy:iy+n-1, jy:jy+nb-1) required toupdate the unreduced part of sub(A).

Application Notes


Q = H(1) H(2) . . . H(nb) and P = G(1) G(2). . . G(nb)


H(i) = I - tauq *v *v' and G(i) = I - taup *u *u',

where tauq and taup are real/complex scalars, and v and u are real/complex vectors.

If m ≥ n, v(1: i-1 ) = 0, v(i) = 1, and v(i:m) is stored on exit in

A(ia+i-1:ia+m-1, ja+i-1); u(1:i) = 0, u(i+1 ) = 1, and u(i+1:n) is stored on exitin A(ia+i-1, ja+i:ja+n-1); tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1).

If m < n, v(1: i) = 0, v(i+1 ) = 1, and v(i+1:m) is stored on exit in

A(ia+i+1:ia+m-1, ja+i-1); u(1:i-1 ) = 0, u(i) = 1, and u(i:n) is stored on exit inA(ia+i-1, ja+i:ja+n-1); tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1). Theelements of the vectors v and u together form the m-by-nb matrix V and the nb-by-n matrix U'which are necessary, with X and Y, to apply the transformation to the unreduced part of thematrix, using a block update of the form: sub(A):= sub(A) - V*Y' - X*U'. The contentsof sub(A) on exit are illustrated by the following examples with nb = 2:

1939

where a denotes an element of the original matrix which is unchanged, vi denotes an elementof the vector defining H(i), and ui an element of the vector defining G(i).

p?laconEstimates the 1-norm of a square matrix, usingthe reverse communication for evaluatingmatrix-vector products.

Syntax

call pslacon(n, v, iv, jv, descv, x, ix, jx, descx, isgn, est, kase)

call pdlacon(n, v, iv, jv, descv, x, ix, jx, descx, isgn, est, kase)

call pclacon(n, v, iv, jv, descv, x, ix, jx, descx, isgn, est, kase)

call pzlacon(n, v, iv, jv, descv, x, ix, jx, descx, isgn, est, kase)

Description

This routine estimates the 1-norm of a square, real/unitary distributed matrix A. Reversecommunication is used for evaluating matrix-vector products. x and v are aligned with thedistributed matrix A, this information is implicitly contained within iv, ix, descv, and descx.

1940


Input Parameters

(global).INTEGER. The length of the distributed vectors v

and x. n ≥ 0.

n

(local).vREAL for pslaconDOUBLE PRECISION for pdlaconCOMPLEX for pclaconCOMPLEX*16 for pzlacon.Pointer into the local memory to an array of DIMENSIONLOCr(n+mod(iv-1, mb_v)). On the final return, v = a*w,where est = norm(v)/norm(w) (w is not returned).

(global) INTEGER. The row and column indices in the globalarray v indicating the first row and the first column of thesubmatrix V, respectively.

iv, jv

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix V.

descv

(local).xREAL for pslaconDOUBLE PRECISION for pdlaconCOMPLEX for pclaconCOMPLEX*16 for pzlacon.Pointer into the local memory to an array of DIMENSIONLOCr(n+mod(ix-1, mb_x)).

(global) INTEGER. The row and column indices in the globalarray x indicating the first row and the first column of thesubmatrix X, respectively.

ix, jx

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix X.

descx

(local). INTEGER.isgnArray, DIMENSION LOCr(n+mod(ix-1, mb_x)). isgn isaligned with x and v.

(local). INTEGER.kaseOn the initial call to p?lacon, kase should be 0.

1941


Output Parameters

(local).xOn an intermediate return, X should be overwritten by A*X,if kase=1, A' *X, if kase=2,p?lacon must be re-called with all the other parametersunchanged.

(global). REAL for single precision flavorsestDOUBLE PRECISION for double precision flavors

(local)kaseINTEGER. On an intermediate return, kase is 1 or 2,indicating whether X should be overwritten by A*X, or A'*X.On the final return from p?lacon, kase is again 0.

p?laconsbLooks for two consecutive small subdiagonalelements.

Syntax

call pslaconsb(a, desca, i, l, m, h44, h33, h43h34, buf, lwork)

call pdlaconsb(a, desca, i, l, m, h44, h33, h43h34, buf, lwork)

Description

This routine looks for two consecutive small subdiagonal elements by analyzing the effect ofstarting a double shift QR iteration given by h44, h33, and h43h34 to see if this process makesa subdiagonal negligible.

Input Parameters

(global). REAL for pslaconsbaDOUBLE PRECISION for pdlaconsbArray, DIMENSION (desca (lld_),*). On entry, theHessenberg matrix whose tridiagonal part is being scanned.Unchanged on exit.

(global and local) INTEGER.descaArray of DIMENSION (dlen_). The array descriptor for thedistributed matrix A.

1942


(global) INTEGER.iThe global location of the bottom of the unreduced submatrixof A. Unchanged on exit.

(global) INTEGER.lThe global location of the top of the unreduced submatrixof A. Unchanged on exit.

(global). REAL for pslaconsbh44, h33, h43h34DOUBLE PRECISION for pdlaconsbThese three values are for the double shift QR iteration.

(global) INTEGER.lworkThis must be at least 7*ceil(ceil( (i-l)/hbl)/lcm(nprow, npcol)). Here lcm is least common multipleand nprowxnpcol is the logical grid size.

Output Parameters

(global). On exit, this yields the starting location of the QRdouble shift. This will satisfy:

m

l ≤ m ≤ i-2.

(local).bufREAL for pslaconsbDOUBLE PRECISION for pdlaconsbArray of size lwork.

(global). On exit, lwork is the size of the work buffer.lwork

p?lacp2Copies all or part of a distributed matrix to anotherdistributed matrix.

Syntax

call pslacp2(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pdlacp2(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pclacp2(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pzlacp2(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

1943


Description

This routine copies all or part of a distributed matrix A to another distributed matrix B. Nocommunication is performed, p?lacp2 performs a local copy sub(A):= sub(B), where sub(A)denotes A(ia:ia+m-1, a:ja+n-1) and sub(B) denotes B(ib:ib+m-1, jb:jb+n-1).

p?lacp2 requires that only dimension of the matrix operands is distributed.

Input Parameters

(global) CHARACTER. Specifies the part of the distributedmatrix sub(A) to be copied:

uplo

= 'U': Upper triangular part is copied; the strictly lowertriangular part of sub(A) is not referenced;= 'L': Lower triangular part is copied; the strictly uppertriangular part of sub(A) is not referenced.Otherwise: all of the matrix sub(A) is copied.



(global) INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A). (n

≥ 0).

(local).aREAL for pslacp2DOUBLE PRECISION for pdlacp2COMPLEX for pclacp2COMPLEX*16 for pzlacp2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the m-by-ndistributed matrix sub(A).

(global) INTEGER. The row and column indices in the globalarray A indicating the first row and the first column ofsub(A), respectively.

ia, ja


desca

1944


(global) INTEGER. The row and column indices in the globalarray B indicating the first row and the first column ofsub(B), respectively.

ib, jb

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix B.

descb

Output Parameters

(local).bREAL for pslacp2DOUBLE PRECISION for pdlacp2COMPLEX for pclacp2COMPLEX*16 for pzlacp2.Pointer into the local memory to an array of DIMENSION(lld_b, LOCc(jb+n-1) ). This array contains on exit the localpieces of the distributed matrix sub( B ) set as follows:if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1),

1 ≤ i ≤ j, 1 ≤ j ≤ n;if uplo = 'L', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1),

j ≤ i ≤ m, 1≤ j ≤ n;

otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1 ≤ i

≤ m, 1 ≤ j ≤ n.

p?lacp3Copies from a global parallel array into a localreplicated array or vice versa.

Syntax

call pslacp3(m, i, a, desca, b, ldb, ii, jj, rev)

call pdlacp3(m, i, a, desca, b, ldb, ii, jj, rev)

Description

This is an auxiliary routine that copies from a global parallel array into a local replicated arrayor vise versa. Note that the entire submatrix that is copied gets placed on one node or more.The receiving node can be specified precisely, or all nodes can receive, or just one row orcolumn of nodes.

1945


Input Parameters

(global) INTEGER.mm is the order of the square submatrix that is copied.

m ≥ 0. Unchanged on exit.

(global) INTEGER. A(i, i) is the global location that thecopying starts from. Unchanged on exit.

i

(global). REAL for pslacp3aDOUBLE PRECISION for pdlacp3Array, DIMENSION (desca(lld_),*). On entry, the parallelmatrix to be copied into or from.


desca

(local).bREAL for pslacp3DOUBLE PRECISION for pdlacp3Array, DIMENSION (ldb, m). If rev = 0, this is the globalportion of the array A(i:i+m-1, i:i+m-1). If rev = 1,this is the unchanged on exit.

(local)ldbINTEGER.The leading dimension of B.

(global) INTEGER. By using rev 0 and 1, data can be sentout and returned again. If rev = 0, then ii is destinationrow index for the node(s) receiving the replicated B. If ii

ii

≥ 0, jj ≥ 0, then node (ii, jj) receives the data. If ii =

-1, jj ≥ 0, then all rows in column jj receive the data. If

ii ≥ 0, jj = -1, then all cols in row ii receive the data. fii = -1, jj = -1, then all nodes receive the data. If rev!=0, then ii is the source row index for the node(s) sendingthe replicated B.

(global) INTEGER. Similar description as ii above.jj

(global) INTEGER. Use rev = 0 to send global A into locallyreplicated B (on node (ii, jj)). Use rev != 0 to send locallyreplicated B from node (ii, jj) to its owner (which changesdepending on its location in A) into the global A.

rev

1946


Output Parameters

(global). On exit, if rev = 1, the copied data. Unchangedon exit if rev = 0.

a

(local). If rev = 1, this is unchanged on exit.b

p?lacpyCopies all or part of one two-dimensional array toanother.

Syntax

call pslacpy(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pdlacpy(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pclacpy(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

call pzlacpy(uplo, m, n, a, ia, ja, desca, b, ib, jb, descb)

Description

This routine copies all or part of a distributed matrix A to another distributed matrix B. Nocommunication is performed, p?lacpy performs a local copy sub(A):= sub(B), where sub(A)denotes A(ia:ia+m-1,ja:ja+n-1) and sub(B) denotes B(ib:ib+m-1,jb:jb+n-1).

Input Parameters

(global). CHARACTER. Specifies the part of the distributedmatrix sub(A) to be copied:

uplo

= 'U': Upper triangular part is copied; the strictly lowertriangular part of sub(A) is not referenced;= 'L': Lower triangular part is copied; the strictly uppertriangular part of sub(A) is not referenced.Otherwise: all of the matrix sub(A) is copied.


of rows of the distributed submatrix sub(A). (m ≥0).

(global) INTEGER.n

1947


The number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A).

(n≥0).

(local).aREAL for pslacpyDOUBLE PRECISION for pdlacpyCOMPLEX for pclacpyCOMPLEX*16 for pzlacpy.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of thedistributed matrix sub(A).


ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray B indicating the first row and the first column of sub(B)respectively.

ib, jb


descb

Output Parameters

(local).bREAL for pslacpyDOUBLE PRECISION for pdlacpyCOMPLEX for pclacpyCOMPLEX*16 for pzlacpy.Pointer into the local memory to an array of DIMENSION(lld_b, LOCc(jb+n-1) ). This array contains on exit the localpieces of the distributed matrix sub( B ) set as follows:if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1),

1≤i≤j, 1≤j≤n;if uplo = 'L', B(ib+i-1, jb+j-1) =

A(ia+i-1, ja+j-1),j≤i≤m, 1≤j≤n;

1948


otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤m,

1≤j≤n.

p?laevswpMoves the eigenvectors from where they arecomputed to ScaLAPACK standard block cyclicarray.

Syntax

call pslaevswp(n, zin, ldzi, z, iz, jz, descz, nvs, key, rwork, lrwork)

call pdlaevswp(n, zin, ldzi, z, iz, jz, descz, nvs, key, rwork, lrwork)

call pclaevswp(n, zin, ldzi, z, iz, jz, descz, nvs, key, rwork, lrwork)

call pzlaevswp(n, zin, ldzi, z, iz, jz, descz, nvs, key, rwork, lrwork)

Description

This routine moves the eigenvectors (potentially unsorted) from where they are computed, toa ScaLAPACK standard block cyclic array, sorted so that the corresponding eigenvalues aresorted.

Input Parameters



(global). INTEGER.n

The order of the matrix A. n ≥ 0.

(local).zinREAL for pslaevswpDOUBLE PRECISION for pdlaevswpCOMPLEX for pclaevswpCOMPLEX*16 for pzlaevswp. Array, DIMENSION (ldzi,nvs(iam) ). The eigenvectors on input. Each eigenvectorresides entirely in one process. Each process holds acontiguous set of nvs(iam) eigenvectors. The firsteigenvector which the process holds is: sum for i=[0, iam-1)of nvs(i).

1949


(local)ldziINTEGER.The leading dimension of the zin array.

(global) INTEGER.The row and column indices in the globalarray Z indicating the first row and the first column of thesubmatrix Z, respectively.

iz, jz

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix Z.

descz

(global) INTEGER.nvsArray, DIMENSION( nprocs+1)nvs(i) = number of processes number of eigenvectors heldby processes [0, i-1)nvs(1) = number of eigen vectors held by[0, 1 -1) = 0nvs(nprocs+1)= number of eigen vectors held by [0,nprocs)= total number of eigenvectors.

(global) INTEGER.keyArray, DIMENSION (n). Indicates the actual index (aftersorting) for each of the eigenvectors.

(local).rworkREAL for pslaevswpDOUBLE PRECISIONfor pdlaevswpCOMPLEX for pclaevswpCOMPLEX*16 for pzlaevswp. Array, DIMENSION (lrwork).

(local)lrworkINTEGER. Dimension of work.

Output Parameters

(local).zREAL for pslaevswpDOUBLE PRECISION for pdlaevswpCOMPLEX for pclaevswpCOMPLEX*16 for pzlaevswp.Array, global DIMENSION (n, n), local DIMENSION(descz(dlen_), nq). The eigenvectors on output. Theeigenvectors are distributed in a block cyclic manner in bothdimensions, with a block size of nb.

1950


p?lahrdReduces the first nb columns of a generalrectangular matrix A so that elements below thek-th subdiagonal are zero, by an orthogonal/unitarytransformation, and returns auxiliary matrices thatare needed to apply the transformation to theunreduced part of A.

Syntax

call pslahrd(n, k, nb, a, ia, ja, desca, tau, t, y, iy, jy, descy, work)

call pdlahrd(n, k, nb, a, ia, ja, desca, tau, t, y, iy, jy, descy, work)

call pclahrd(n, k, nb, a, ia, ja, desca, tau, t, y, iy, jy, descy, work)

call pzlahrd(n, k, nb, a, ia, ja, desca, tau, t, y, iy, jy, descy, work)

Description

The routines reduces the first nb columns of a real general n-by-(n-k+1) distributed matrixA(ia:ia+n-1 , ja:ja+n-k) so that elements below the k-th subdiagonal are zero. Thereduction is performed by an orthogonal/unitary similarity transformation Q' * A * Q. Theroutine returns the matrices V and T which determine Q as a block reflector I - V*T*V', andalso the matrix Y = A * V * T.

This is an auxiliary routine called by p?gehrd. In the following comments sub(A) denotesA(ia:ia+n-1, ja:ja+n-1).

Input Parameters

(global) INTEGER.n

The order of the distributed submatrix sub(A). n ≥ 0.

(global) INTEGER.kThe offset for the reduction. Elements below the k-thsubdiagonal in the first nb columns are reduced to zero.

(global) INTEGER.nbThe number of columns to be reduced.

(local).aREAL for pslahrdDOUBLE PRECISION for pdlahrd

1951


COMPLEX for pclahrdCOMPLEX*16 for pzlahrd.Pointer into the local memory to an array of DIMENSION(lld_a, LOCc(ja+n-k)). On entry, this array contains thelocal pieces of the n-by-(n-k+1) general distributed matrixA(ia:ia+n-1, ja:ja+n-k).


ia, ja


desca

(global) INTEGER. The row and column indices in the globalarray Y indicating the first row and the first column of thesubmatrix sub(Y), respectively.

iy, jy

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix Y.

descy

(local).workREAL for pslahrdDOUBLE PRECISION for pdlahrdCOMPLEX for pclahrdCOMPLEX*16 for pzlahrd.Array, DIMENSION (nb).

Output Parameters

(local).aOn exit, the elements on and above the k-th subdiagonalin the first nb columns are overwritten with thecorresponding elements of the reduced distributedmatrix;the elements below the k-th subdiagonal, with thearray tau, represent the matrix Q as a product of elementaryreflectors. The other columns of A(ia:ia+n-1, ja:ja+n-k)are unchanged. See Application Notes below.

(local)tauREAL for pslahrdDOUBLE PRECISION for pdlahrdCOMPLEX for pclahrd

1952


COMPLEX*16 for pzlahrd.Array, DIMENSION LOCc(ja+n-2). The scalar factors of theelementary reflectors (see Application Notes below). tau istied to the distributed matrix A.

(local)REAL for pslahrdtDOUBLE PRECISION for pdlahrdCOMPLEX for pclahrdCOMPLEX*16 for pzlahrd.Array, DIMENSION (nb_a, nb_a) The upper triangular matrixT.

(local).yREAL for pslahrdDOUBLE PRECISION for pdlahrdCOMPLEX for pclahrdCOMPLEX*16 for pzlahrd.Pointer into the local memory to an array of DIMENSION(lld_y, nb_a). On exit, this array contains the local pieces

of the n-by-nb distributed matrix Y. lld_y ≥ LOCr(ia+n-1).

Application Notes

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).


H(i) = i - tau *v * v',

where tau is a real/complex scalar, and v is a real/complex vector with v(1: i+k-1)= 0, v(i+k)=1; v(i+k+1:n) is stored on exit in A(ia+i+k:ia+n-1, ja+i-1), and tau in TAU(ja+i-1).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, withT and Y, to apply the transformation to the unreduced part of the matrix, using an update ofthe form: A(ia:ia+n-1, ja:ja+n-k) := (I-V*T*V')*(A(ia:ia+n-1, ja:ja+n-k)-Y*V'). Thecontents of A(ia:ia+n-1, ja:ja+n-k) on exit are illustrated by the following example with n= 7, k = 3, and nb = 2:

1953


where a denotes an element of the original matrix A(ia:ia+n-1, ja:ja+n-k), h denotes amodified element of the upper Hessenberg matrix H, and vi denotes an element of the vectordefining H(i).

p?laiectExploits IEEE arithmetic to accelerate thecomputations of eigenvalues. (C interface function).

Syntax

void pslaiect(float *sigma, int *n, float *d, int *count);

void pdlaiectb(float *sigma, int *n, float *d, int *count);

void pdlaiectl(float *sigma, int *n, float *d, int *count);

Description

This routine computes the number of negative eigenvalues of (A- σI). This implementation ofthe Sturm Sequence loop exploits IEEE arithmetic and has no conditionals in the innermostloop. The signbit for real routine pslaiect is assumed to be bit 32. Double precision routinespdlaiectb and pdlaiectl differ in the order of the double precision word storage and,consequently, in the signbit location. For pdlaiectb, the double precision word is stored inthe big-endian word order and the signbit is assumed to be bit 32. For pdlaiectl, the doubleprecision word is stored in the little-endian word order and the signbit is assumed to be bit 64.

1954


Note that all arguments are call-by-reference so that this routine can be directly called fromFortran code.

This is a ScaLAPACK internal subroutine and arguments are not checked for unreasonablevalues.

Input Parameters

Realfor pslaiectsigmaDOUBLE PRECISIONfor pdlaiectb/pdlaiectl.The shift. p?laiect finds the number of eigenvalues lessthan equal to sigma.

INTEGER. The order of the tridiagonal matrix T. n ≥ 1.n

Real for pslaiectdDOUBLE PRECISION for pdlaiectb/pdlaiectl.Array of DIMENSION(2n -1).On entry, this array contains the diagonals and the squaresof the off-diagonal elements of the tridiagonal matrix T.These elements are assumed to be interleaved in memoryfor better cache performance. The diagonal entries of T arein the entries d(1), d(3),..., d(2n-1), while thesquares of the off-diagonal entries are d(2), d(4), ...,d(2n-2). To avoid overflow, the matrix must be scaled sothat its largest entry is no greater than overflow(1/2) *underflow(1/4) in absolute value, and for greatest accuracy,it should not be much smaller than that.

Output Parameters

INTEGER. The count of the number of eigenvalues of T lessthan or equal to sigma.

n

1955


p?langeReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement, of a general rectangular matrix.

Syntax

val = pslange(norm, m, n, a, ia, ja, desca, work)

val = pdlange(norm, m, n, a, ia, ja, desca, work)

val = pclange(norm, m, n, a, ia, ja, desca, work)

val = pzlange(norm, m, n, a, ia, ja, desca, work)

Description

The function returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1).

p?lange returns the value

( max(abs(A(i,j))), norm = 'M' or 'm' with ia ≤ i ≤ ia+m-1,

( and ja ≤ j ≤ ja+n-1,

(

( norm1( sub(A)), norm = '1', 'O' or 'o'

(

( normI( sub(A)), norm = 'I' or 'i'

(

( normF( sub(A)), norm = 'F', 'f', 'E' or 'e',

where norm1 denotes the 1-norm of a matrix (maximum column sum), normI denotes theinfinity norm of a matrix (maximum row sum) and normF denotes the Frobenius norm of amatrix (square root of sum of squares). Note that max(abs(A(i,j))) is not a matrix norm.

1956


Input Parameters

(global) CHARACTER. Specifies the value to be returned bythe routine as described above.

norm

(global). INTEGER.mThe number of rows to be operated on, that is, the numberof rows of the distributed submatrix sub(A). When m = 0,

p?lange is set to zero. m ≥ 0.

(global). INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A).

When n = 0, p?lange is set to zero. n ≥ 0.

(local).aReal for pslangeDOUBLE PRECISION for pdlangeCOMPLEX for pclangeCOMPLEX*16 for pzlange.Pointer into the local memory to an array of DIMENSION(lld_a, LOCc(ja+n-1)) containing the local pieces of thedistributed matrix sub(A).


ia, ja


desca

(local).workReal for pslangeDOUBLE PRECISION for pdlangeCOMPLEX for pclangeCOMPLEX*16 for pzlange.Array DIMENSION (lwork).

lwork ≥ 0 if norm = 'M' or 'm' (not referenced),nq0 if norm = '1', 'O' or 'o',mp0 if norm = 'I' or 'i',0 if norm = 'F', 'f', 'E' or 'e' (not referenced),where

1957


iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1,nb_a ),iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol),mp0 = numroc( m+iroffa, mb_a, myrow, iarow,nprow ),nq0 = numroc( n+icoffa, nb_a, mycol, iacol,npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

Output Parameters

The value returned by the fuction.val

p?lanhsReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement, of an upper Hessenberg matrix.

Syntax

val = pslanhs(norm, n, a, ia, ja, desca, work)

val = pdlanhs(norm, n, a, ia, ja, desca, work)

val = pclanhs(norm, n, a, ia, ja, desca, work)

val = pzlanhs(norm, n, a, ia, ja, desca, work)

Description

The function returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1).

p?lanhs returns the value


1958



(


(

(normI( sub(A)), norm = 'I' or 'i'

(

(normF( sub(A)), norm = 'F', 'f', 'E' or 'e',


Input Parameters


norm


When n = 0, p?lanhs is set to zero. n ≥ 0.

(local).aReal for pslanhsDOUBLE PRECISION for pdlanhsCOMPLEX for pclanhsCOMPLEX*16 for pzlanhs.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)) containing the localpieces of the distributed matrix sub(A).

(global) INTEGER.ia, jaThe row and column indices in the global array A indicatingthe first row and the first column of the submatrix sub(A),respectively.


desca

(local).work

1959


Real for pslanhsDOUBLE PRECISION for pdlanhsCOMPLEX for pclanhsCOMPLEX*16 for pzlanh.Array, DIMENSION (lwork).

lwork ≥ 0 if norm = 'M' or 'm' (not referenced),nq0 if norm = '1', 'O' or 'o',mp0 if norm = 'I' or 'i',0 if norm = 'F', 'f', 'E' or 'e' (not referenced),whereiroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),mp0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),nq0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,imycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

Output Parameters


1960


p?lansy, p?lanheReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement, of a real symmetric or a complexHermitian matrix.

Syntax

val = pslansy(norm, uplo, n, a, ia, ja, desca, work)

val = pdlansy(norm, uplo, n, a, ia, ja, desca, work)

val = pclansy(norm, uplo, n, a, ia, ja, desca, work)

val = pzlansy(norm, uplo, n, a, ia, ja, desca, work)

val = pclanhe(norm, uplo, n, a, ia, ja, desca, work)

val = pzlanhe(norm, uplo, n, a, ia, ja, desca, work)

Description

The functions return the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1).

p?lansy, p?lanhe return the value



(


(


(



1961


Input Parameters


norm

(global) CHARACTER. Specifies whether the upper or lowertriangular part of the symmetric matrix sub(A) is to bereferenced.

uplo

= 'U': Upper triangular part of sub(A) is referenced,= 'L': Lower triangular part of sub(A) is referenced.

(global) INTEGER.nThe number of columns to be operated on i.e the numberof columns of the distributed submatrix sub(A). When n =

0, p?lansy is set to zero. n ≥ 0.

(local).aReal for pslansyDOUBLE PRECISION for pdlansyCOMPLEX for pclansy, pclanheCOMPLEX*16 for pzlansy, pzlanhe.Pointer into the local memory to an array of DIMENSION(lld_a, LOCc(ja+n-1)) containing the local pieces of thedistributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular matrix whose norm isto be computed, and the strictly lower triangular part of thismatrix is not referenced. If uplo = 'L', the leading n-by-nlower triangular part of sub(A) contains the lower triangularmatrix whose norm is to be computed, and the strictly uppertriangular part of sub(A) is not referenced.


ia, ja


desca

(local).workReal for pslansyDOUBLE PRECISION for pdlansyCOMPLEX for pclansy, pclanhe

1962


COMPLEX*16 for pzlansy, pzlanhe.Array DIMENSION (lwork).

lwork ≥ 0 if norm = 'M' or 'm' (not referenced),2*nq0+np0+ldw if norm = '1', 'O' or 'o', 'I' or 'i',where ldw is given by:if( nprow.ne.npcol ) thenldw = mb_a*ceil(ceil(np0/mb_a)/(lcm/nprow))elseldw = 0end if0 if norm = 'F', 'f', 'E' or 'e' (not referenced),where lcm is the least common multiple of nprow andnpcol, lcm = ilcm( nprow, npcol ) and ceil denotesthe ceiling operation (iceil).iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),mp0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),nq0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

Output Parameters


1963


p?lantrReturns the value of the 1-norm, Frobenius norm,infinity-norm, or the largest absolute value of anyelement, of a triangular matrix.

Syntax

val = pslantr(norm, uplo, diag, m, n, a, ia, ja, desca, work)

val = pdlantr(norm, uplo, diag, m, n, a, ia, ja, desca, work)

val = pclantr(norm, uplo, diag, m, n, a, ia, ja, desca, work)

val = pzlantr(norm, uplo, diag, m, n, a, ia, ja, desca, work)

Description

The function returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, orthe element of largest absolute value of a trapezoidal or triangular distributed matrix sub(A)= A(ia:ia+m-1, ja:ja+n-1).

p?lantr returns the value



(


(


(



1964


Input Parameters


norm

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of thesymmetric matrix sub(A) is to be referenced.= 'U': Upper trapezoidal,= 'L': Lower trapezoidal.Note that sub(A) is triangular instead of trapezoidal if m =n.

(global). CHARACTER.diagSpecifies whether or not the distributed matrix sub(A) hasunit diagonal.= 'N': Non-unit diagonal.= 'U': Unit diagonal.

(global) INTEGER.mThe number of rows to be operated on, that is, the numberof rows of the distributed submatrix sub(A). When m = 0,

p?lantr is set to zero. m ≥ 0.

(global) INTEGER.nThe number of columns to be operated on i.e the numberof columns of the distributed submatrix sub(A). When n =

0, p?lantr is set to zero. n ≥ 0.

(local).aReal for pslantrDOUBLE PRECISION for pdlantrCOMPLEX for pclantrCOMPLEX*16 for pzlantr.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)) containing the localpieces of the distributed matrix sub(A).

(global) INTEGER.ia, jaThe row and column indices in the global array a indicatingthe first row and the first column of the submatrix sub(A),respectively.

1965



desca

(local).workReal for pslantrDOUBLE PRECISION for pdlantrCOMPLEX for pclantrCOMPLEX*16 for pzlantr.Array DIMENSION (lwork).

lwork ≥ 0 if norm = 'M' or 'm' (not referenced),nq0 if norm = '1', 'O' or 'o',mp0 if norm = 'I' or 'i',0 if norm = 'F', 'f', 'E' or 'e' (not referenced),where lcm is the least common multiple of nprow and npcollcm = ilcm( nprow, npcol ) and ceil denotes theceiling operation (iceil).iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1,nb_a ),iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol),mp0 = numroc( m+iroffa, mb_a, myrow, iarow,nprow ),nq0 = numroc( n+icoffa, nb_a, mycol, iacol,npcol ),indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.

Output Parameters


1966


p?lapivApplies a permutation matrix to a generaldistributed matrix, resulting in row or columnpivoting.

Syntax

call pslapiv(direc, rowcol, pivroc, m, n, a, ia, ja, desca, ipiv, ip, jp,descip, iwork)

call pdlapiv(direc, rowcol, pivroc, m, n, a, ia, ja, desca, ipiv, ip, jp,descip, iwork)

call pclapiv(direc, rowcol, pivroc, m, n, a, ia, ja, desca, ipiv, ip, jp,descip, iwork)

call pzlapiv(direc, rowcol, pivroc, m, n, a, ia, ja, desca, ipiv, ip, jp,descip, iwork)

Description

This routine applies either P (permutation matrix indicated by ipiv) or inv(P) to a generalm-by-n distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1), resulting in row or columnpivoting. The pivot vector may be distributed across a process row or a column. The pivot vectorshould be aligned with the distributed matrix A. This routine will transpose the pivot vector, ifnecessary.

For example, if the row pivots should be applied to the columns of sub(A), pass rowcol='C'and pivroc='C'.

Input Parameters

(global) CHARACTER*1.direcSpecifies in which order the permutation is applied:= 'F' (Forward). Applies pivots forward from top of matrix.Computes P*sub(A).= 'B' (Backward) Applies pivots backward from bottom ofmatrix.Computes inv(P)*sub(A).

(global) CHARACTER*1.rowcolSpecifies if the rows or columns are to be permuted:= 'R' Rows will be permuted,

1967


= 'C' Columns will be permuted.

(global) CHARACTER*1.pivrocSpecifies whether ipiv is distributed over a process row orcolumn:= 'R'ipiv is distributed over a process row,= 'C'ipiv is distributed over a process column.

(global) INTEGER.mThe number of rows to be operated on, that is, the numberof rows of the distributed submatrix sub(A). When m = 0,

p?lapiv is set to zero. m ≥ 0.


When n = 0, p?lapiv is set to zero. n ≥ 0.

(local).aReal for pslapivDOUBLE PRECISION for pdlapivCOMPLEX for pclapivCOMPLEX*16 for pzlapiv.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)) containing the localpieces of the distributed matrix sub(A).



desca

(local). INTEGER.ipivArray, DIMENSION (lipiv) ;when rowcol='R' or 'r':

lipiv≥LOCr(ia+m-1) + mb_a if pivroc='C' or 'c',

lipiv≥LOCc(m + mod(jp-1, nb_p)) if pivroc='R' or'r', and,when rowcol='C' or 'c':

1968


lipiv≥LOCr(n + mod(ip-1, mb_p)) if pivroc='C' or'c',

lipiv≥LOCc(ja+n-1) + nb_a if pivroc='R' or 'r'.This array contains the pivoting information. ipiv(i) is theglobal row (column), local row (column) i was swappedwith. When rowcol='R' or 'r' and pivroc='C' or 'c',or rowcol='C' or 'c' and pivroc='R' or 'r', the lastpiece of this array of size mb_a (resp. nb_a) is used asworkspace. In those cases, this array is tied to thedistributed matrix A.

(global) INTEGER. The row and column indices in the globalarray P indicating the first row and the first column of thesubmatrix sub(P), respectively.

ip, jp

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed vector ipiv.

descip

(local). INTEGER.iworkArray, DIMENSION (ldw), where ldw is equal to theworkspace necessary for transposition, and the storage ofthe tranposed ipiv :

1969


Let lcm be the least common multiple of nprow and npcol.

If( rowcol.eq.'r' .and. pivroc. eq.'r') then

If( nprow.eq. npcol) then

ldw = LOCr( n_p + mod(jp-1, nb_p) ) + nb_p

else

ldw = LOCr( n_p + mod(jp-1, nb_p) )+

nb_p * ceil( ceil(LOCc(n_p)/nb_p) / (lcm/npcol))

end if

else if( rowcol.eq.'c' .and. pivroc.eq.'c')then

if( nprow.eq.

npcol ) then

ldw = LOCc( m_p + mod(ip-1, mb_p) ) + mb_p

else

ldw = LOCc( m_p + mod(ip-1, mb_p) ) +

mb_p *ceil(ceil(LOCr(m_p)/mb_p) / (lcm/nprow) )

end if

else

iwork is not referenced.

end if.

Output Parameters

(local).aOn exit, the local pieces of the permuted distributedsubmatrix.

1970


p?laqgeScales a general rectangular matrix, using row andcolumn scaling factors computed by p?geequ .

Syntax

call pslaqge(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, equed)

call pdlaqge(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, equed)

call pclaqge(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, equed)

call pzlaqge(m, n, a, ia, ja, desca, r, c, rowcnd, colcnd, amax, equed)

Description

This routine equilibrates a general m-by-n distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1)using the row and scaling factors in the vectors r and c computed by p?geequ.

Input Parameters



(global).INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A). (n

≥ 0).

(local).aREAL for pslaqgeDOUBLE PRECISION for pdlaqgeCOMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the distributed matrix sub(A).


ia, ja

1971



desca

(local).rREAL for pslaqgeDOUBLE PRECISION for pdlaqgeCOMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.Array, DIMENSION LOCr(m_a). The row scale factors forsub(A). r is aligned with the distributed matrix A, andreplicated across every process column. r is tied to thedistributed matrix A.

(local).cREAL for pslaqgeDOUBLE PRECISION for pdlaqgeCOMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.Array, DIMENSIONLOCc(n_a). The row scale factors forsub(A). c is aligned with the distributed matrix A, andreplicated across every process column. c is tied to thedistributed matrix A.

(local).rowcndREAL for pslaqgeDOUBLE PRECISION for pdlaqgeCOMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.The global ratio of the smallest r(i) to the largest r(i),

ia ≤ i ≤ ia+m-1.

(local).colcndREAL for pslaqgeDOUBLE PRECISION for pdlaqgeCOMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.The global ratio of the smallest c(i) to the largest c(i),

ia ≤ i ≤ ia+n-1.

(global). REAL for pslaqgeamaxDOUBLE PRECISION for pdlaqge

1972


COMPLEX for pclaqgeCOMPLEX*16 for pzlaqge.Absolute value of largest distributed submatrix entry.

Output Parameters

(local).aOn exit, the equilibrated distributed matrix. See equed forthe form of the equilibrated distributed submatrix.

(global) CHARACTER.equedSpecifies the form of equilibration that was done.= 'N': No equilibration= 'R': Row equilibration, that is, sub(A) has beenpre-multiplied by diag(r(ia:ia+m-1)),= 'C': column equilibration, that is, sub(A) has beenpost-multiplied by diag(c(ja:ja+n-1)),= 'B': Both row and column equilibration, that is, sub(A)has been replaced by diag(r(ia:ia+m-1))* sub(A) *diag(c(ja:ja+n-1)).

p?laqsyScales a symmetric/Hermitian matrix, using scalingfactors computed by p?poequ .

Syntax

call pslaqsy(uplo, n, a, ia, ja, desca, sr, sc, scond, amax, equed)

call pdlaqsy(uplo, n, a, ia, ja, desca, sr, sc, scond, amax, equed)

call pclaqsy(uplo, n, a, ia, ja, desca, sr, sc, scond, amax, equed)

call pzlaqsy(uplo, n, a, ia, ja, desca, sr, sc, scond, amax, equed)

Description

This routine equilibrates a symmetric distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1)using the scaling factors in the vectors sr and sc. The scaling factors are computed by p?poequ.

1973


Input Parameters

(global) CHARACTER. Specifies the upper or lower triangularpart of the symmetric distributed matrix sub(A)is to bereferenced:

uplo

= 'U': Upper triangular part;= 'L': Lower triangular part.

(global) INTEGER.n

The order of the distributed submatrix sub(A). n ≥ 0.

(local).aREAL for pslaqsyDOUBLE PRECISION for pdlaqsyCOMPLEX for pclaqsyCOMPLEX*16 for pzlaqsy.Pointer into the local memory to an array of DIMENSION(lld_a,LOCc(ja+n-1)).On entry, this array contains the local pieces of thedistributed matrix sub(A). On entry, the local pieces of thedistributed symmetric matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andthe strictly lower triangular part of sub(A) is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andthe strictly upper triangular part of sub(A) is not referenced.



desca

(local)srREAL for pslaqsyDOUBLE PRECISION for pdlaqsyCOMPLEX for pclaqsyCOMPLEX*16 for pzlaqsy.

1974


Array, DIMENSION LOCr(m_a). The scale factors forA(ia:ia+m-1, ja:ja+n-1). sr is aligned with thedistributed matrix A, and replicated across every processcolumn. sr is tied to the distributed matrix A.

(local)scREAL for pslaqsyDOUBLE PRECISION for pdlaqsyCOMPLEX for pclaqsyCOMPLEX*16 for pzlaqsy.Array, DIMENSION LOCc(m_a). The scale factors for A(ia:ia+m-1, ja:ja+n-1). sr is aligned with the distributedmatrix A, and replicated across every process column. sris tied to the distributed matrix A.

(global). REAL for pslaqsyscondDOUBLE PRECISION for pdlaqsyCOMPLEX for pclaqsyCOMPLEX*16 for pzlaqsy.Ratio of the smallest sr(i) (respectively sc(j)) to the

largest sr(i) (respectively sc(j)), with ia ≤ i ≤ ia+n-1

and ja ≤ j ≤ ja+n-1.

(global).amaxREAL for pslaqsyDOUBLE PRECISION for pdlaqsyCOMPLEX for pclaqsyCOMPLEX*16 for pzlaqsy.Absolute value of largest distributed submatrix entry.

Output Parameters

On exit,aif equed = 'Y', the equilibrated matrix:diag(sr(ia:ia+n-1)) * sub(A) *diag(sc(ja:ja+n-1)).

(global) CHARACTER*1.equedSpecifies whether or not equilibration was done.= 'N': No equilibration.

1975


= 'Y': Equilibration was done, that is, sub(A) has beenreplaced by:diag(sr(ia:ia+n-1))* sub(A) *diag(sc(ja:ja+n-1)).

p?lared1dRedistributes an array assuming that the inputarray, bycol, is distributed across rows and thatall process columns contain the same copy ofbycol.

Syntax

call pslared1d(n, ia, ja, desc, bycol, byall, work, lwork)

call pdlared1d(n, ia, ja, desc, bycol, byall, work, lwork)

Description

This routine redistributes a 1D array. It assumes that the input array bycol is distributed acrossrows and that all process column contain the same copy of bycol. The output array byall isidentical on all processes and contains the entire array.

Input Parameters

np = Number of local rows in bycol()

(global). INTEGER.n

The number of elements to be redistributed. n ≥ 0.

(global) INTEGER. ia, ja must be equal to 1.ia, ja

(global and local) INTEGER array, DIMENSION 8. A 2D arraydescirptor, which describes bycol.

desc

(local).bycolREAL for pslared1dDOUBLE PRECISION for pdlared1dCOMPLEX for pclared1dCOMPLEX*16 for pzlared1d.Distributed block cyclic array global DIMENSION (n), localDIMENSION np. bycol is distributed across the process rows.All process columns are assumed to contain the same value.

1976


(local).workREAL for pslared1dDOUBLE PRECISION for pdlared1dCOMPLEX for pclared1dCOMPLEX*16 for pzlared1d.DIMENSION (lwork). Used to hold the buffers sent from oneprocess to another.

(local)lwork

INTEGER. The size of the work array. lwork ≥ numroc(n,desc( nb_), 0, 0, npcol).

Output Parameters

(global). REAL for pslared1dbyallDOUBLE PRECISION for pdlared1dCOMPLEX for pclared1dCOMPLEX*16 for pzlared1d.Global DIMENSION (n), local DIMENSION (n). byall is exactlyduplicated on all processes. It contains the same values asbycol, but it is replicated across all processes rather thanbeing distributed.

p?lared2dRedistributes an array assuming that the inputarray byrow is distributed across columns and thatall process rows contain the same copy of byrow.

Syntax

call pslared2d(n, ia, ja, desc, byrow, byall, work, lwork)

call pdlared2d(n, ia, ja, desc, byrow, byall, work, lwork)

Description

This routine redistributes a 1D array. It assumes that the input array byrow is distributed acrosscolumns and that all process rows contain the same copy of byrow. The output array byall willbe identical on all processes and will contain the entire array.

1977


Input Parameters

np = Number of local rows in byrow()

(global) INTEGER.n

The number of elements to be redistributed. n ≥ 0.

(global) INTEGER. ia, ja must be equal to 1.ia, ja

(global and local) INTEGER array, DIMENSION (dlen_). A2D array descirptor, which describes byrow.

desc

(local).byrowREAL for pslared2dDOUBLE PRECISION for pdlared2dCOMPLEX for pclared2dCOMPLEX*16 for pzlared2d.Distributed block cyclic array global DIMENSION (n), localDIMENSION np. bycol is distributed across the processcolumns. All process rows are assumed to contain the samevalue.

(local).workREAL for pslared2dDOUBLE PRECISION for pdlared2dCOMPLEX for pclared2dCOMPLEX*16 for pzlared2d.DIMENSION (lwork). Used to hold the buffers sent from oneprocess to another.

(local).INTEGER. The size of the work array. lwork ≥numroc(n, desc( nb_ ), 0, 0, npcol).

lwork

Output Parameters

(global). REAL for pslared2dbyallDOUBLE PRECISION for pdlared2dCOMPLEX for pclared2dCOMPLEX*16 for pzlared2d.Global DIMENSION(n), local DIMENSION (n). byall is exactlyduplicated on all processes. It contains the same values asbycol, but it is replicated across all processes rather thanbeing distributed.

1978


p?larfApplies an elementary reflector to a generalrectangular matrix.

Syntax

call pslarf(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

call pdlarf(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

call pclarf(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

call pzlarf(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

Description

This routine applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-ndistributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Qis represented in the form

Q = I - tau * v * v',


If tau = 0, then Q is taken to be the unit matrix.

Input Parameters

(global). CHARACTER.side= 'L': form Q * sub(C),= 'R': form sub(C)* Q, Q = QT.




≥ 0).

(local).vREAL for pslarfDOUBLE PRECISION for pdlarfCOMPLEX for pclarf

1979


COMPLEX*16 for pzlarf.Pointer into the local memory to an array of DIMENSION(lld_v,*) containing the local pieces of the distributedvectors V representing the Householder transformation Q,v(iv:iv+m-1, jv) if side = 'L' and incv = 1,v(iv, jv:jv+m-1) if side = 'L' and incv = m_v,v(iv:iv+n-1, jv) if side = 'R' and incv = 1,v(iv, jv:jv+n-1) if side = 'R' and incv = m_v.The vector v is the representation of Q. v is not used if tau= 0.

(global) INTEGER. The row and column indices in the globalarray V indicating the first row and the first column of thesubmatrix sub(V), respectively.

iv, jv


descv

(global) INTEGER.incvThe global increment for the elements of v. Only two valuesof incv are supported in this version, namely 1 and m_v.incv must not be zero.

(local).tauREAL for pslarfDOUBLE PRECISION for pdlarfCOMPLEX for pclarfCOMPLEX*16 for pzlarf.Array, DIMENSION LOCc(jv) if incv = 1, and LOCr(iv)otherwise. This array contains the Householder scalarsrelated to the Householder vectors.tau is tied to the distributed matrix v.

(local).cREAL for pslarfDOUBLE PRECISION for pdlarfCOMPLEX for pclarfCOMPLEX*16 for pzlarf.Pointer into the local memory to an array ofDIMENSION(lld_c, LOCc(jc+n-1) ), containing the localpieces of sub(C).

(global) INTEGER.ic, jc

1980


The row and column indices in the global array c indicatingthe first row and the first column of the submatrix sub(C),respectively.

(global and local) INTEGER array, DIMENSION (dlen_). Thearray descriptor for the distributed matrix C.

descc

(local).workREAL for pslarfDOUBLE PRECISION for pdlarfCOMPLEX for pclarfCOMPLEX*16 for pzlarf.Array, DIMENSION (lwork).

If incv = 1,

if side = 'L',

if ivcol = iccol,

lwork≥nqc0

else

lwork≥mpc0 + max( 1, nqc0 )

1981


end if

else if side = 'R' ,

lwork≥nqc0 + max( max( 1, mpc0), numroc(numroc(n+

icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq ) )

end if

else if incv = m_v,

if side = 'L',

lwork≥mpc0 + max( max( 1, nqc0 ), numroc(

numroc(m+iroffc,mb_v,0,0,nprow ),mb_v,0,0, lcmp) )

else if side = 'R',

if ivrow = icrow,

lwork≥mpc0

else

lwork≥nqc0 + max( 1, mpc0 )

end if

end if

end if,

where lcm is the least common multiple of nprow and npcoland lcm = ilcm( nprow, npcol ), lcmp = lcm/nprow, lcmq= lcm/npcol,iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1,nb_c ),icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol),mpc0 = numroc( m+iroffc, mb_c, myrow, icrow,nprow ),

1982


nqc0 = numroc( n+icoffc, nb_c, mycol, iccol,npcol ),ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.

Output Parameters

(local).cOn exit, sub(C) is overwritten by the Q * sub(C) if side= 'L',or sub(C) * Q if side = 'R'.

p?larfbApplies a block reflector or itstranspose/conjugate-transpose to a generalrectangular matrix.

Syntax

call pslarfb(side, trans, direct, storev, m, n, k, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pdlarfb(side, trans, direct, storev, m, n, k, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pclarfb(side, trans, direct, storev, m, n, k, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pzlarfb(side, trans, direct, storev, m, n, k, v, iv, jv, descv, t, c,ic, jc, descc, work)

Description

This routine applies a real/complex block reflector Q or its transpose QT/conjugate transposeQH to a real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from theleft or the right.

Input Parameters

(global).CHARACTER.side

1983


if side = 'L': apply Q or QT for real flavors (QH for complexflavors) from the Left;if side = 'R': apply Q or QTfor real flavors (QH for complexflavors) from the Right.

(global).CHARACTER.transif trans = 'N': no transpose, apply Q;for real flavors, if trans='T': transpose, apply QT

for complex flavors, if trans = 'C': conjugate transpose,apply QH;

(global) CHARACTER. Indicates how Q is formed from aproduct of elementary reflectors.

direct

if direct = 'F': Q = H(1) H(2) . . . H(k) (Forward)if direct = 'B': Q = H(k) . . . H(2) H(1) (Backward)

(global) CHARACTER.storevIndicates how the vectors that define the elementaryreflectors are stored:if storev = 'C': Columnwiseif storev = 'R': Rowwise.


of rows of the distributed submatrix sub(C). (m ≥ 0).

(global) INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(C). (n

≥ 0).

(global) INTEGER.kThe order of the matrix T.

(local).vREAL for pslarfbDOUBLE PRECISION for pdlarfbCOMPLEX for pclarfbCOMPLEX*16 for pzlarfb.Pointer into the local memory to an array of DIMENSION( lld_v, LOCc(jv+k-1)) if storev = 'C',( lld_v, LOCc(jv+m-1)) if storev = 'R' and side ='L',

1984


( lld_v, LOCc(jv+n-1) ) if storev = 'R' and side ='R'.It contains the local pieces of the distributed vectors Vrepresenting the Householder transformation.

if storev = 'C' and side = 'L', lld_v ≥max(1,LOCr(iv+m-1));

if storev = 'C' and side = 'R', lld_v ≥max(1,LOCr(iv+n-1));

if storev = 'R', lld_v ≥ LOCr(jv+k-1).

(global) INTEGER.iv, jvThe row and column indices in the global array V indicatingthe first row and the first column of the submatrix sub(V),respectively.


descv

(local).cREAL for pslarfbDOUBLE PRECISION for pdlarfbCOMPLEX for pclarfbCOMPLEX*16 for pzlarfb.Pointer into the local memory to an array ofDIMENSION(lld_c, LOCc(jc+n-1) ), containing the localpieces of sub(C).

(global) INTEGER. The row and column indices in the globalarray C indicating the first row and the first column of thesubmatrix sub(C), respectively.

ic, jc


descc

(local).workREAL for pslarfbDOUBLE PRECISION for pdlarfbCOMPLEX for pclarfbCOMPLEX*16 for pzlarfb.Workspace array, DIMENSION (lwork).

If storev = 'C',

1985


if side = 'L',

lwork ≥ ( nqc0 + mpc0 ) * k

else if side = 'R',

lwork ≥ ( nqc0 + max( npv0 + numroc( numroc( n+ icoffc,

nb_v, 0, 0, npcol ), nb_v, 0, 0, lcmq ),

mpc0 ) ) * k

end if

else if storev = 'R' ,

if side = 'L' ,

lwork ≥ ( mpc0 + max( mqv0 + numroc( numroc(m+iroffc,

mb_v, 0, 0, nprow ), mb_v, 0, 0, lcmp ),

nqc0 ) ) * k

else if side = 'R',

lwork ≥ ( mpc0 + nqc0 ) * k

end if

end if,

where

1986


lcmq = lcm / npcol with lcm = iclm( nprow, npcol),

iroffv = mod( iv-1, mb_v ), icoffv = mod( jv-1,nb_v ),

ivrow = indxg2p( iv, mb_v, myrow, rsrc_v, nprow),

ivcol = indxg2p( jv, nb_v, mycol, csrc_v, npcol),

MqV0 = numroc( m+icoffv, nb_v, mycol, ivcol,npcol ),

NpV0 = numroc( n+iroffv, mb_v, myrow, ivrow,nprow ),

iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1,nb_c ),

icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow),

iccol = indxg2p(

jc, nb_c, mycol, csrc_c, npcol ),

MpC0 = numroc( m+iroffc, mb_c, myrow, icrow,nprow ),

NpC0 = numroc( n+icoffc, mb_c, myrow, icrow,nprow ),

NqC0 = numroc( n+icoffc, nb_c, mycol, iccol,npcol ),

ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.

Output Parameters

(local).tREAL for pslarfbDOUBLE PRECISION for pdlarfbCOMPLEX for pclarfbCOMPLEX*16 for pzlarfb.

1987


Array, DIMENSION( mb_v, mb_v) if storev = 'R', and( nb_v, nb_v) if storev = 'C'. The triangular matrix tis the representation of the block reflector.

(local).cOn exit, sub(C) is overwritten by the Q * sub(C), or Q'*sub(C) or sub(C)*Q or sub(C)*Q'.

p?larfcApplies the conjugate transpose of an elementaryreflector to a general matrix.

Syntax

call pclarfc(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

call pzlarfc(side, m, n, v, iv, jv, descv, incv, tau, c, ic, jc, descc, work)

Description

This routine applies a complex elementary reflector QH to a complex m-by-n distributed matrixsub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in theform

Q = i - tau * v * v',

where tau is a complex scalar and v is a complex vector.


Input Parameters

(global).CHARACTER.sideif side = 'L': form QH*sub(C) ;if side = 'R': form sub (C)*QH.



(global) INTEGER.n

1988


The number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(C). (n

≥ 0).

(local).vCOMPLEX for pclarfcCOMPLEX*16 for pzlarfc.Pointer into the local memory to an array of DIMENSION(lld_v,*) containing the local pieces of the distributedvectors v representing the Householder transformation Q,v(iv:iv+m-1, jv) if side = 'L' and incv = 1,v(iv, jv:jv+m-1) if side = 'L' and incv = m_v,v(iv:iv+n-1, jv) if side = 'R' and incv = 1,v(iv, jv:jv+n-1) if side = 'R' and incv = m_v.The vector v is the representation of Q. v is not used if tau= 0.



descv


(local)tauCOMPLEX for pclarfcCOMPLEX*16 for pzlarfc.Array, DIMENSION LOCc(jv) if incv = 1, and LOCr(iv)otherwise. This array contains the Householder scalarsrelated to the Householder vectors.tau is tied to the distributed matrix V.

(local).cCOMPLEX for pclarfcCOMPLEX*16 for pzlarfc.

1989


Pointer into the local memory to an array of DIMENSION(lld_c, LOCc(jc+n-1) ), containing the local pieces ofsub(C).

(global) INTEGER.ic, jcThe row and column indices in the global array C indicatingthe first row and the first column of the submatrix sub(C),respectively.


descc

(local).workCOMPLEX for pclarfcCOMPLEX*16 for pzlarfc.Workspace array, DIMENSION (lwork).

If incv = 1,

if side = 'L' ,

if ivcol = iccol,

lwork ≥ nqc0

else

lwork ≥ mpc0 + max( 1, nqc0 )

end if

else if side = 'R',

lwork ≥ nqc0 + max( max( 1, mpc0 ), numroc(numroc(

n+icoffc,nb_v,0,0,npcol ), nb_v,0,0,lcmq ) )

end if

1990


else if incv = m_v,

if side = 'L',

lwork ≥ mpc0 + max( max( 1, nqc0 ), numroc(numroc(

m+iroffc,mb_v,0,0,nprow ),mb_v,0,0,lcmp ) )


if ivrow = icrow,

lwork ≥ mpc0

else

lwork ≥ nqc0 + max( 1, mpc0 )

end if

end if

end if,

where lcm is the least common multiple of nprow and npcoland lcm = ilcm( nprow, npcol ), lcmp = lcm /nprow, lcmq = lcm / npcol,iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1,nb_c ),icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol),mpc0 = numroc( m+iroffc, mb_c, myrow, icrow,nprow ),nqc0 = numroc( n+icoffc, nb_c, mycol, iccol,npcol ),ilcm, indxg2p, and numroc are ScaLAPACK toolfunctions;myrow, mycol, nprow, and npcol can bedetermined by calling the subroutine blacs_gridinfo.

1991


Output Parameters

(local).cOn exit, sub(C) is overwritten by the QH * sub(C) if side ='L', or sub(C) * QH if side = 'R'.

p?larfgGenerates an elementary reflector (Householdermatrix).

Syntax

call pslarfg(n, alpha, iax, jax, x, ix, jx, descx, incx, tau)

call pdlarfg(n, alpha, iax, jax, x, ix, jx, descx, incx, tau)

call pclarfg(n, alpha, iax, jax, x, ix, jx, descx, incx, tau)

call pzlarfg(n, alpha, iax, jax, x, ix, jx, descx, incx, tau)

Description

This routine generates a real/complex elementary reflector H of order n, such that

H * sub(X) = H * ( x(iax, jax) ) = ( alpha ), H' * H = i,

( x ) ( 0 )

where alpha is a scalar (a real scalar - for complex flavors), and sub(X) is an (n-1)-elementreal/complex distributed vector X(ix:ix+n-2, jx) if incx = 1 and X(ix, jx:jx+n-2) ifincx = descx(m_). H is represented in the form

H = I - tau * (1) * (1 v' ) ,

( v )

where tau is a real/complex scalar and v is a real/complex (n-1)-element vector. Note that His not Hermitian.

If the elements of sub(X) are all zero (and X(iax, jax) is real for complex flavors), then tau= 0 and H is taken to be the unit matrix.

Otherwise 1 ≤ real(tau) ≤ 2 and abs(tau-1) ≤ 1.

1992


Input Parameters

(global) INTEGER.n

The global order of the elementary reflector. n ≥ 0.

(global) INTEGER.iax, jaxThe global row and column indices in x of X(iax, jax).

(local).xReal for pslarfgDOUBLE PRECISION for pdlarfgCOMPLEX for pclarfgCOMPLEX*16 for pzlarfg.Pointer into the local memory to an array of DIMENSION(lld_x, *). This array contains the local pieces of thedistributed vector sub(X). Before entry, the incrementedarray sub(X) must contain vector x.

(global) INTEGER.ix, jxThe row and column indices in the global array X indicatingthe first row and the first column of sub(X), respectively.

(global and local) INTEGER.descxArray of DIMENSION (dlen_). The array descriptor for thedistributed matrix X.

(global) INTEGER.incxThe global increment for the elements of x. Only two valuesof incx are supported in this version, namely 1 and m_x.incx must not be zero.

Output Parameters

(local)alphaREAL for pslafgDOUBLE PRECISION for pdlafgCOMPLEX for pclafgCOMPLEX*16 for pzlafg.On exit, alpha is computed in the process scope having thevector sub(X).

(local).xOn exit, it is overwritten with the vector v.

1993


(local).tauReal for pslarfgDOUBLE PRECISION for pdlarfgCOMPLEX for pclarfgCOMPLEX*16 for pzlarfg.Array, DIMENSION LOCc(jx) if incx = 1, and LOCr(ix)otherwise. This array contains the Householder scalarsrelated to the Householder vectors.tau is tied to the distributed matrix X.

p?larftForms the triangular vector T of a block reflectorH=I-V*T*VH.

Syntax

call pslarft(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pdlarft(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pclarft(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pzlarft(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

Description

This routine forms the triangular factor T of a real/complex block reflector H of order n, whichis defined as a product of k elementary reflectors.



If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-thcolumn of the distributed matrix V, and

H = I - V * T * V'

If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-throw of the distributed matrix V, and

H = I - V' * T *V.

1994


Input Parameters

(global) CHARACTER*1.directSpecifies the order in which the elementary reflectors aremultiplied to form the block reflector:if direct = 'F': H = H(1) H(2) . . . H(k) (forward)if direct = 'B': H = H(k) . . . H(2) H(1)(backward).

(global) CHARACTER*1.storevSpecifies how the vectors that define the elementaryreflectors are stored (See Application Notes below):if storev = 'C': columnwise;if storev = 'R': rowwise.

(global) INTEGER.n

The order of the block reflector H. n ≥ 0.

(global) INTEGER.kThe order of the triangular factor T (= the number ofelementary reflectors).

1 ≤ k ≤ mb_v (= nb_v).

REAL for pslarftvDOUBLE PRECISION for pdlarftCOMPLEX for pclarftCOMPLEX*16 for pzlarft.Pointer into the local memory to an array of local DIMENSION(LOCr(iv+n-1), LOCc(jv+k-1)) if storev = 'C', and(LOCr(iv+k-1), LOCc(jv+n-1)) if storev = 'R'.The distributed matrix V contains the Householder vectors.(See Application Notes below).

(global) INTEGER.iv, jvThe row and column indices in the global array v indicatingthe first row and the first column of the submatrix sub(V),respectively.


descv

(local)tauREAL for pslarft

1995


DOUBLE PRECISION for pdlarftCOMPLEX for pclarftCOMPLEX*16 for pzlarft.Array, DIMENSIONLOCr(iv+k-1) if incv = m_v, andLOCc(jv+k-1) otherwise. This array contains theHouseholder scalars related to the Householder vectors.tau is tied to the distributed matrix V.

(local).workREAL for pslarftDOUBLE PRECISION for pdlarftCOMPLEX for pclarftCOMPLEX*16 for pzlarft.Workspace array, DIMENSION (k*(k-1)/2).

Output Parameters

REAL for pslarftvDOUBLE PRECISION for pdlarftCOMPLEX for pclarftCOMPLEX*16 for pzlarft.

(local)tREAL for pslarftDOUBLE PRECISION for pdlarftCOMPLEX for pclarftCOMPLEX*16 for pzlarft.Array, DIMENSION (nb_v,nb_v) if storev = 'Col', and(mb_v,mb_v) otherwise. It contains the k-by-k triangularfactor of the block reflector associated with v. If direct ='F', t is upper triangular;if direct = 'B', t is lower triangular.

Application Notes

The shape of the matrix V and the storage of the vectors that define the H(i) is best illustratedby the following example with n = 5 and k = 3. The elements equal to 1 are not stored; thecorresponding array elements are modified but restored on exit. The rest of the array is notused.

1996


1997


p?larzApplies an elementary reflector as returned byp?tzrzf to a general matrix.

Syntax

call pslarz(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

call pdlarz(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

call pclarz(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

call pzlarz(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

Description

This routine applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-ndistributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Qis represented in the form

Q = I - tau * v * v',



Q is a product of k elementary reflectors as returned by p?tzrzf.

Input Parameters

(global) CHARACTER.sideif side = 'L': form Q* sub(C),if side = 'R': form sub (C)*Q, Q=QT (for real flavors).



(global) INTEGER.n

1998


The number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(C). (n

≥ 0).

(global). INTEGER.lThe columns of the distributed submatrix sub(A) containingthe meaningful part of the Householder reflectors. If side

= 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

(local).vREAL for pslarzDOUBLE PRECISION for pdlarzCOMPLEX for pclarzCOMPLEX*16 for pzlarz.Pointer into the local memory to an array of DIMENSION(lld_v,*) containing the local pieces of the distributedvectors v representing the Householder transformation Q,v(iv:iv+l-1, jv) if side = 'L' and incv = 1,v(iv, jv:jv+l-1) if side = 'L' and incv = m_v,v(iv:iv+l-1, jv) if side = 'R' and incv = 1,v(iv, jv:jv+l-1) if side = 'R' and incv = m_v.The vector v in the representation of Q. v is not used if tau= 0.


iv, jv


descv


(local)tauREAL for pslarzDOUBLE PRECISION for pdlarzCOMPLEX for pclarzCOMPLEX*16 for pzlarz.

1999


Array, DIMENSION LOCc(jv) if incv = 1, and LOCr(iv)otherwise. This array contains the Householder scalarsrelated to the Householder vectors.tau is tied to the distributed matrix V.

(local).cREAL for pslarzDOUBLE PRECISION for pdlarzCOMPLEX for pclarzCOMPLEX*16 for pzlarz.Pointer into the local memory to an array of DIMENSION(lld_c, LOCc(jc+n-1) ), containing the local pieces ofsub(C).



descc

(local).workREAL for pslarzDOUBLE PRECISION for pdlarzCOMPLEX for pclarzCOMPLEX*16 for pzlarz.

2000


Array, DIMENSION (lwork)

If incv = 1,

if side = 'L' ,

if ivcol = iccol,

lwork ≥ NqC0

else

lwork ≥ MpC0 + max(1, NqC0)

end if


lwork ≥ NqC0 + max(max(1, MpC0),numroc(numroc(n+icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq))

end if

else if incv = m_v,

if side = 'L' ,

lwork ≥ MpC0 + max(max(1, NqC0),numroc(numroc(m+iroffc,mb_v,0,0,nprow),mb_v,0,0,lcmp))


if ivrow = icrow,

lwork ≥ MpC0

else

lwork ≥ NqC0 + max(1, MpC0)

end if

end if

end if,

where lcm is the least common multiple of nprow and npcolandlcm = ilcm( nprow, npcol ), lcmp = lcm / nprow,lcmq = lcm / npcol,

2001


iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.

Output Parameters

(local).cOn exit, sub(C) is overwritten by the Q * sub(C) if side= 'L', or sub(C) * Q if side = 'R'.

p?larzbApplies a block reflector or itstranspose/conjugate-transpose as returned byp?tzrzf to a general matrix.

Syntax

call pslarzb(side, trans, direct, storev, m, n, k, l, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pdlarzb(side, trans, direct, storev, m, n, k, l, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pclarzb(side, trans, direct, storev, m, n, k, l, v, iv, jv, descv, t, c,ic, jc, descc, work)

call pzlarzb(side, trans, direct, storev, m, n, k, l, v, iv, jv, descv, t, c,ic, jc, descc, work)

Description

This routine applies a real/complex block reflector Q or its transpose QT (conjugate transposeQH for complex flavors) to a real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1,jc:jc+n-1) from the left or the right.


2002



Input Parameters

(global) CHARACTER.sideif side = 'L': apply Q or QT (QH for complex flavors) fromthe Left;if side = 'R': apply Q or QT (QH for complex flavors) fromthe Right.

(global) CHARACTER.transif trans = 'N': No transpose, apply Q;If trans='T': Transpose, apply QT (real flavors);If trans='C': Conjugate transpose, apply QH (complexflavors).

(global) CHARACTER.directIndicates how H is formed from a product of elementaryreflectors.if direct = 'F': H = H(1) H(2) . . . H(k) (forward,not supported)if direct = 'B': H = H(k) . . . H(2) H(1)(backward)

(global) CHARACTER.storevIndicates how the vectors that define the elementaryreflectors are stored:if storev = 'C': columnwise (not supported ).if storev = 'R': rowwise.




≥ 0).

(global) INTEGER.kThe order of the matrix T. (= the number of elementaryreflectors whose product defines the block reflector).

(global) INTEGER.l

2003


The columns of the distributed submatrix sub(A) containingthe meaningful part of the Householder reflectors.

If side = 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

(local).vREAL for pslarzbDOUBLE PRECISION for pdlarzbCOMPLEX for pclarzbCOMPLEX*16 for pzlarzb.Pointer into the local memory to an array ofDIMENSION(lld_v, LOCc(jv+m-1)) if side = 'L',(lld_v, LOCc(jv+m-1)) if side = 'R'.It contains the local pieces of the distributed vectors Vrepresenting the Householder transformation as returnedby p?tzrzf.

lld_v ≥ LOCr(iv+k-1).



descv

(local)tREAL for pslarzbDOUBLE PRECISION for pdlarzbCOMPLEX for pclarzbCOMPLEX*16 for pzlarzb.Array, DIMENSION mb_v by mb_v.The lower triangular matrix T in the representation of theblock reflector.

(local).cREAL for pslarfbDOUBLE PRECISION for pdlarfbCOMPLEX for pclarfbCOMPLEX*16 for pzlarfb.

2004


Pointer into the local memory to an array ofDIMENSION(lld_c, LOCc(jc+n-1)).On entry, the m-by-n distributed matrix sub(C).

(global) INTEGER.ic, jcThe row and column indices in the global array c indicatingthe first row and the first column of the submatrix sub(C),respectively.


descc

(local).workREAL for pslarzbDOUBLE PRECISION for pdlarzbCOMPLEX for pclarzbCOMPLEX*16 for pzlarzb.Array, DIMENSION (lwork).

If storev = 'C' ,

if side = 'L' ,

lwork ≥(NqC0 + MpC0)* k


lwork ≥ (NqC0 + max(NpV0 +numroc(numroc(n+icoffc, nb_v, 0, 0, npcol),nb_v, 0, 0, lcmq), mpc0))* k

end if

else if storev = 'R' ,

if side = 'L' ,

lwork ≥ (mpc0 + max(mqv0 + numroc(numroc(m+iroffc, mb_v, 0, 0, nprow), mb_v, 0, 0, lcmp),nqc0))* k


lwork ≥ (MpC0 + NqC0) * k

end if

end if,

2005


where lcmq = lcm/npcol with lcm = iclm(nprow,npcol),iroffv = mod(iv-1, mb_v), icoffv = mod( jv-1,nb_v),ivrow = indxg2p(iv, mb_v, myrow, rsrc_v, nprow),ivcol = indxg2p(jv, nb_v, mycol, csrc_v, npcol),MqV0 = numroc(m+icoffv, nb_v, mycol, ivcol,npcol),NpV0 = numroc(n+iroffv, mb_v, myrow, ivrow,nprow),iroffc = mod(ic-1, mb_c ), icoffc= mod( jc-1,nb_c),icrow= indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol= indxg2p(jc, nb_c, mycol, csrc_c, npcol),MpC0 = numroc(m+iroffc, mb_c, myrow, icrow,nprow),NpC0 = numroc(n+icoffc, mb_c, myrow, icrow,nprow),NqC0 = numroc(n+icoffc, nb_c, mycol, iccol,npcol),ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.

Output Parameters

(local).cOn exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or sub(C)*Q, or sub(C)* Q'.

2006


p?larzcApplies (multiplies by) the conjugate transpose ofan elementary reflector as returned by p?tzrzfto a general matrix.

Syntax

call pclarzc(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

call pzlarzc(side, m, n, l, v, iv, jv, descv, incv, tau, c, ic, jc, descc,work)

Description

This routine applies a complex elementary reflector QH to a complex m-by-n distributed matrixsub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in theform

Q = i - tau * v * v',

where tau is a complex scalar and v is a complex vector.



Input Parameters

(global) CHARACTER.sideif side = 'L': form QH *sub(C);if side = 'R': form sub(C)*QH .




≥ 0).

(global) INTEGER.l

2007


The columns of the distributed submatrix sub(A) containingthe meaningful part of the Householder reflectors.

If side = 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

(local).v

COMPLEX for pclarzcCOMPLEX*16 for pzlarzc.Pointer into the local memory to an array of DIMENSION(lld_v,*) containing the local pieces of the distributedvectors v representing the Householder transformation Q,v(iv:iv+l-1, jv) if side = 'L' and incv = 1,v(iv, jv:jv+l-1) if side = 'L' and incv = m_v,v(iv:iv+l-1, jv) if side = 'R' and incv = 1,v(iv, jv:jv+l-1) if side = 'R' and incv = m_v.The vector v in the representation of Q. v is not used if tau= 0.



descv

(global). INTEGER.incvThe global increment for the elements of v. Only two valuesof incv are supported in this version, namely 1 and m_v.incv must not be zero.

(local)tauCOMPLEX for pclarzcCOMPLEX*16 for pzlarzc.Array, DIMENSIONLOCc(jv) if incv = 1, and LOCr(iv)otherwise. This array contains the Householder scalarsrelated to the Householder vectors.tau is tied to the distributed matrix V.

(local).cCOMPLEX for pclarzcCOMPLEX*16 for pzlarzc.

2008


Pointer into the local memory to an array ofDIMENSION(lld_c, LOCc(jc+n-1) ), containing the localpieces of sub(C).



descc

2009


(local).

If incv = 1,

if side = 'L' ,

if ivcol = iccol,

lwork ≥ NqC0

else

lwork ≥ MpC0 + max(1, NqC0)

end if


lwork ≥ nqc0 + max(max(1, mpc0),numroc(numroc(n+icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq))

end if

else if incv = m_v,

if side = 'L' ,

lwork ≥ mpc0 + max(max(1, nqc0),numroc(numroc(m+iroffc,mb_v,0,0,nprow),mb_v,0,0,lcmp))

else if side = 'R',

if ivrow = icrow,

lwork ≥ mpc0

else

lwork ≥ nqc0 + max(1, mpc0)

end if

end if

end if,

work

where lcm is the least common multiple of nprow and npcol;lcm = ilcm(nprow, npcol), lcmp = lcm/nprow, lcmq=lcm/npcol,

2010


iroffc = mod(ic-1, mb_c), icoffc= mod(jc-1,nb_c),icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),MpC0 = numroc(m+iroffc, mb_c, myrow, icrow,nprow),NqC0 = numroc(n+icoffc, nb_c, mycol, iccol,npcol),ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.

Output Parameters

(local).cOn exit, sub(C) is overwritten by the QH*sub(C) if side ='L', or sub(C)*QH if side = 'R'.

p?larztForms the triangular factor T of a block reflectorH=I-V*T*VH as returned by p?tzrzf.

Syntax

call pslarzt(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pdlarzt(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pclarzt(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

call pzlarzt(direct, storev, n, k, v, iv, jv, descv, tau, t, work)

Description

This routine forms the triangular factor T of a real/complex block reflector H of order > n, whichis defined as a product of k elementary reflectors as returned by p?tzrzf.



2011


If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-thcolumn of the array v, and

H = i - v * t * v'

If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-throw of the array v, and

H = i - v' * t * v


Input Parameters

(global) CHARACTER.directSpecifies the order in which the elementary reflectors aremultiplied to form the block reflector:if direct = 'F': H = H(1) H(2) . . . H(k) (Forward,not supported)if direct = 'B': H = H(k) . . . H(2) H(1)(Backward).

(global) CHARACTER.storevSpecifies how the vectors which define the elementaryreflectors are stored:if storev = 'C': columnwise (not supported);if storev = 'R': rowwise.

(global). INTEGER.n

The order of the block reflector H. n ≥ 0.

(global). INTEGER.kThe order of the triangular factor T (= the number ofelementary reflectors).

1 ≤ k ≤ mb_v (= nb_v).

REAL for pslarztvDOUBLE PRECISION for pdlarztCOMPLEX for pclarztCOMPLEX*16 for pzlarzt.Pointer into the local memory to an array of localDIMENSION(LOCr(iv+k-1), LOCc(jv+n-1)).The distributed matrix V contains the Householder vectors.See Application Notes below.

2012



iv, jv


descv

(local)tauREAL for pslarztDOUBLE PRECISION for pdlarztCOMPLEX for pclarztCOMPLEX*16 for pzlarzt.Array, DIMENSION LOCr(iv+k-1) if incv = m_v, andLOCc(jv+k-1) otherwise. This array contains theHouseholder scalars related to the Householder vectors.tau is tied to the distributed matrix V.

(local).workREAL for pslarztDOUBLE PRECISION for pdlarztCOMPLEX for pclarztCOMPLEX*16 for pzlarzt.Workspace array, DIMENSION(k*(k-1)/2).

Output Parameters

REAL for pslarztvDOUBLE PRECISION for pdlarztCOMPLEX for pclarztCOMPLEX*16 for pzlarzt.

(local)tREAL for pslarztDOUBLE PRECISION for pdlarztCOMPLEX for pclarztCOMPLEX*16 for pzlarzt.Array, DIMENSION (mb_v, mb_v). It contains the k-by-ktriangular factor of the block reflector associated with v. tis lower triangular.

2013


Application Notes


2014


2015


p?lasclMultiplies a general rectangular matrix by a realscalar defined as Cto/Cfrom.

Syntax

call pslascl(type, cfrom, cto, m, n, a, ia, ja, desca, info)

call pdlascl(type, cfrom, cto, m, n, a, ia, ja, desca, info)

call pclascl(type, cfrom, cto, m, n, a, ia, ja, desca, info)

call pzlascl(type, cfrom, cto, m, n, a, ia, ja, desca, info)

Description

This routine multiplies the m-by-n real/complex distributed matrix sub(A) denoting A(ia:ia+m-1,ja:ja+n-1) by the real/complex scalar cto/cfrom. This is done without over/underflow aslong as the final result cto * A(i,j)/cfrom does not over/underflow. type specifies thatsub(A) may be full, upper triangular, lower triangular or upper Hessenberg.

Input Parameters

(global) CHARACTER.typetype indices of the storage type of the input distributedmatrix.if type = 'G': sub(A) is a full matrix,if type = 'L': sub(A) is a lower triangular matrix,if type = 'U': sub(A) is an upper triangular matrix,if type = 'H': sub(A) is an upper Hessenberg matrix.

(global)cfrom, ctoREAL for pslascl/pclasclDOUBLE PRECISION for pdlascl/pzlascl.The distributed matrix sub(A) is multiplied by cto/cfrom .A(i,j) is computed without over/underflow if the final resultcto*A(i,j)/cfrom can be represented withoutover/underflow. cfrom must be nonzero.



2016


≥ 0).

(local input/local output)aREAL for pslasclDOUBLE PRECISION for pdlasclCOMPLEX for pclasclCOMPLEX*16 for pzlascl.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).This array contains the local pieces of the distributed matrixsub(A).

(global) INTEGER.ia, jaThe column and row indices in the global array A indicatingthe first row and column of the submatrix sub(A),respectively.

(global and local) INTEGER .descaArray of DIMENSION (dlen_).The array descriptor for thedistributed matrix A.

Output Parameters

(local).aOn exit, this array contains the local pieces of the distributedmatrix multiplied by cto/cfrom.

(local)infoINTEGER.if info = 0: the execution is successful.if info < 0: If the i-th argument is an array and thej-entry had an illegal value, then info = -(i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

2017

p?lasetInitializes the offdiagonal elements of a matrix toalpha and the diagonal elements to beta.

Syntax

call pslaset(uplo, m, n, alpha, beta, a, ia, ja, desca)

call pdlaset(uplo, m, n, alpha, beta, a, ia, ja, desca)

call pclaset(uplo, m, n, alpha, beta, a, ia, ja, desca)

call pzlaset(uplo, m, n, alpha, beta, a, ia, ja, desca)

Description

This routine initializes an m-by-n distributed matrix sub(A) denoting A(ia:ia+m-1, ja:ja+n-1)to beta on the diagonal and alpha on the offdiagonals.

Input Parameters

(global) CHARACTER.uploSpecifies the part of the distributed matrix sub(A) to be set:if uplo = 'U': upper triangular part is set; the strictly lowertriangular part of sub(A) is not changed;if uplo = 'L': lower triangular part is set; the strictly uppertriangular part of sub(A) is not changed.Otherwise: all of the matrix sub(A) is set.




≥ 0).

(global).alphaREAL for pslasetDOUBLE PRECISION for pdlasetCOMPLEX for pclasetCOMPLEX*16 for pzlaset.

2018


The constant to which the offdiagonal elements are to beset.

(global).betaREAL for pslasetDOUBLE PRECISION for pdlasetCOMPLEX for pclasetCOMPLEX*16 for pzlaset.The constant to which the diagonal elements are to be set.

Output Parameters

(local).aREAL for pslasetDOUBLE PRECISION for pdlasetCOMPLEX for pclasetCOMPLEX*16 for pzlaset.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).This array contains the local pieces of the distributed matrixsub(A) to be set. On exit, the leading m-by-n submatrixsub(A) is set as follows:

if uplo = 'U', A(ia+i-1, ja+j-1) = alpha, 1 ≤ i ≤

j-1, 1 ≤ j ≤ n,

if uplo = 'L', A(ia+i-1, ja+j-1) = alpha, j+1 ≤ i ≤

m, 1 ≤ j ≤ n,

otherwise, A(ia+i-1, ja+j-1) = alpha, 1 ≤ i ≤ m, 1 ≤

j ≤ n, ia+i.ne.ja+j, and, for all uplo, A(ia+i-1,

ja+i-1) = beta, 1 ≤ i ≤ min(m,n).

(global) INTEGER.ia, jaThe column and row indices in the global array A indicatingthe first row and column of the submatrix sub(A),respectively.

(global and local) INTEGER .descaArray of DIMENSION (dlen_). The array descriptor for thedistributed matrix A.

2019


p?lasmsubLooks for a small subdiagonal element from thebottom of the matrix that it can safely set to zero.

Syntax

call pslasmsub(a, desca, i, l, k, smlnum, buf, lwork)

call pdlasmsub(a, desca, i, l, k, smlnum, buf, lwork)

Description

This routine looks for a small subdiagonal element from the bottom of the matrix that it cansafely set to zero. This routine does a global maximum and must be called by all processes.

Input Parameters

(global)aREAL for pslasmsubDOUBLE PRECISION for pdlasmsubArray, DIMENSION(desca(lld_),*).On entry, the Hessenberg matrix whose tridiagonal part isbeing scanned. Unchanged on exit.

(global and local) INTEGER.descaArray of DIMENSION (dlen_). The array descriptor for thedistributed matrix A.

(global) INTEGER.iThe global location of the bottom of the unreduced submatrixof A. Unchanged on exit.

(global) INTEGER.lThe global location of the top of the unreduced submatrixof A.Unchanged on exit.

(global)smlnumREAL for pslasmsubDOUBLE PRECISION for pdlasmsubOn entry, a “small number” for the given matrix. Unchangedon exit.

(global) INTEGER.lwork

2020


On exit, lwork is the size of the work buffer.This must be at least 2*ceil(ceil((i-l)/hbl )/lcm(nprow,npcol)). Here lcm is least common multiple,and nprow x npcol is the logical grid size.

Output Parameters

(global) INTEGER.kOn exit, this yields the bottom portion of the unreduced

submatrix. This will satisfy: l ≤ m ≤ i-1.

(local).bufREAL for pslasmsubDOUBLE PRECISION for pdlasmsubArray of size lwork.

p?lassqUpdates a sum of squares represented in scaledform.

Syntax

call pslassq(n, x, ix, jx, descx, incx, scale, sumsq)

call pdlassq(n, x, ix, jx, descx, incx, scale, sumsq)

call pclassq(n, x, ix, jx, descx, incx, scale, sumsq)

call pzlassq(n, x, ix, jx, descx, incx, scale, sumsq)

Description

This routine returns the values scl and smsq such that

scl2 * smsq = x(1)2 + ... + x(n)2 + scale2*sumsq,

where x( i ) = sub(X) = X(ix + (jx-1)*descx(m_) + (i - 1)*incx) forpslassq/pdlassq and x( i ) = sub(X) = abs(X(ix + (jx-1)*descx(m_) + (i -1)*incx) for pclassq/pzlassq.

For real routines pslassq/pdlassq the value of sumsq is assumed to be non-negative and sclreturns the value

scl = max( scale, abs(x(i))).

2021


For complex routines pclassq/pzlassq the value of sumsq is assumed to be at least unity andthe value of ssq will then satisfy

1.0 ≤ ssq ≤ sumsq +2n

Value scale is assumed to be non-negative and scl returns the value

For all routines p?lassq values scale and sumsq must be supplied in scale and sumsqrespectively, and scale and sumsq are overwritten by scl and ssq respectively.

All routines p?lassq make only one pass through the vector sub(x).

Input Parameters

(global) INTEGER.nThe length of the distributed vector sub(x ).

REAL for pslassqxDOUBLE PRECISION for pdlassqCOMPLEX for pclassqCOMPLEX*16 for pzlassq.The vector for which a scaled sum of squares is computed:

x(ix + (jx-1)*m_x + (i - 1)*incx), 1 ≤ i ≤ n.

(global) INTEGER.ixThe row index in the global array X indicating the first rowof sub(X).

(global) INTEGER.jxThe column index in the global array X indicating the firstcolumn of sub(X).

(global and local) INTEGER array of DIMENSION (dlen_).descxThe array descriptor for the distributed matrix X.

(global) INTEGER.incxThe global increment for the elements of X. Only two valuesof incx are supported in this version, namely 1 and m_x.The argument incx must not equal zero.

(local).scale

2022


REAL for pslassq/pclassqDOUBLE PRECISION for pdlassq/pzlassq.On entry, the value scale in the equation above.

(local)sumsqREAL for pslassq/pclassqDOUBLE PRECISION for pdlassq/pzlassq.On entry, the value sumsq in the equation above.

Output Parameters

(local).scaleOn exit, scale is overwritten with scl , the scaling factorfor the sum of squares.

(local).sumsqOn exit, sumsq is overwritten with the value smsq, the basicsum of squares from which scl has been factored out.

p?laswpPerforms a series of row interchanges on a generalrectangular matrix.

Syntax

call pslaswp(direc, rowcol, n, a, ia, ja, desca, k1, k2, ipiv)

call pdlaswp(direc, rowcol, n, a, ia, ja, desca, k1, k2, ipiv)

call pclaswp(direc, rowcol, n, a, ia, ja, desca, k1, k2, ipiv)

call pzlaswp(direc, rowcol, n, a, ia, ja, desca, k1, k2, ipiv)

Description

This routine performs a series of row or column interchanges on the distributed matrixsub(A)=A(ia:ia+n-1, ja:ja+n-1). One interchange is initiated for each of rows or columnsk1 through k2 of sub(A). This routine assumes that the pivoting information has already beenbroadcast along the process row or column. Also note that this routine will only work for k1-k2being in the same mb (or nb) block. If you want to pivot a full matrix, use p?lapiv.

2023


Input Parameters

(global) CHARACTER.direcSpecifies in which order the permutation is applied:= 'F' - forward,= 'B' - backward.

(global) CHARACTER.rowcolSpecifies if the rows or columns are permuted:= 'R' - rows,= 'C' - columns.

(global) INTEGER.nIf rowcol='R', the length of the rows of the distributedmatrix A(*, ja:ja+n-1) to be permuted;If rowcol='C', the length of the columns of the distributedmatrix A(ia:ia+n-1 , *) to be permuted;

(local)aREAL for pslaswpDOUBLE PRECISION for pdlaswpCOMPLEX for pclaswpCOMPLEX*16 for pzlaswp.Pointer into the local memory to an array of DIMENSION(lld_a, *). On entry, this array contains the local pieces ofthe distributed matrix to which the row/columnsinterchanges will be applied.

(global) INTEGER.iaThe row index in the global array A indicating the first rowof sub(A).

(global) INTEGER.jaThe column index in the global array A indicating the firstcolumn of sub(A).

(global and local) INTEGER array of DIMENSION (dlen_).descaThe array descriptor for the distributed matrix A.

(global) INTEGER.k1The first element of ipiv for which a row or columninterchange will be done.

(global) INTEGER.k2

2024


The last element of ipiv for which a row or columninterchange will be done.

(local)ipivINTEGER. Array, DIMENSION LOCr(m_a)+mb_a for rowpivoting and LOCr(n_a)+nb_a for column pivoting. Thisarray is tied to the matrix A, ipiv(k)=l implies rows (orcolumns) k and l are to be interchanged.

Output Parameters

(local)AREAL for pslaswpDOUBLE PRECISION for pdlaswpCOMPLEX for pclaswpCOMPLEX*16 for pzlaswp.On exit, the permuted distributed matrix.

p?latraComputes the trace of a general square distributedmatrix.

Syntax

val = pslatra(n, a, ia, ja, desca)

val = pdlatra(n, a, ia, ja, desca)

val = pclatra(n, a, ia, ja, desca)

val = pzlatra(n, a, ia, ja, desca)

Description

This function computes the trace of an n-by-n distributed matrix sub(A) denoting A(ia:ia+n-1,ja:ja+n-1). The result is left on every process of the grid.

Input Parameters

(global) INTEGER.nThe number of rows and columns to be operated on, that

is, the order of the distributed submatrix sub(A). n ≥ 0.

2025


(local).aReal for pslatraDOUBLE PRECISION for pdlatraCOMPLEX for pclatraCOMPLEX*16 for pzlatra.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)) containing the localpieces of the distributed matrix, the trace of which is to becomputed.

(global) INTEGER. The row and column indices respectivelyin the global array A indicating the first row and the firstcolumn of the submatrix sub(A), respectively.

ia, ja

(global and local) INTEGER array of DIMENSION (dlen_).The array descriptor for the distributed matrix A.

desca

Output Parameters


p?latrdReduces the first nb rows and columns of asymmetric/Hermitian matrix A to real tridiagonalform by an orthogonal/unitary similaritytransformation.

Syntax

call pslatrd(uplo, n, nb, a, ia, ja, desca, d, e, tau, w, iw, jw, descw, work)

call pdlatrd(uplo, n, nb, a, ia, ja, desca, d, e, tau, w, iw, jw, descw, work)

call pclatrd(uplo, n, nb, a, ia, ja, desca, d, e, tau, w, iw, jw, descw, work)

call pzlatrd(uplo, n, nb, a, ia, ja, desca, d, e, tau, w, iw, jw, descw, work)

Description

This routine reduces nb rows and columns of a real symmetric or complex Hermitian matrixsub(A)= A(ia:ia+n-1, ja:ja+n-1) to symmetric/complex tridiagonal form by anorthogonal/unitary similarity transformation Q'* sub(A)* Q, and returns the matrices V andW, which are needed to apply the transformation to the unreduced part of sub(A).

2026


If uplo = U, p?latrd reduces the last nb rows and columns of a matrix, of which the uppertriangle is supplied;

if uplo = L, p?latrd reduces the first nb rows and columns of a matrix, of which the lowertriangle is supplied.

This is an auxiliary routine called by p?sytrd/p?hetrd.

Input Parameters

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix sub(A) is stored:= 'U': Upper triangular= L: Lower triangular.



(global) INTEGER.nbThe number of rows and columns to be reduced.

REAL for pslatrdaDOUBLE PRECISION for pdlatrdCOMPLEX for pclatrdCOMPLEX*16 for pzlatrd.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of thesymmetric/Hermitian distributed matrix sub(A).If uplo = U, the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andits strictly lower triangular part is not referenced.If uplo = L, the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andits strictly upper triangular part is not referenced.


(global) INTEGER.ja

2027


The column index in the global array A indicating the firstcolumn of sub(A).


desca

(global) INTEGER.iwThe row index in the global array W indicating the first rowof sub(W).

(global) INTEGER.jwThe column index in the global array W indicating the firstcolumn of sub(W).

(global and local) INTEGER array of DIMENSION (dlen_).The array descriptor for the distributed matrix W.

descw

(local)workREAL for pslatrdDOUBLE PRECISION for pdlatrdCOMPLEX for pclatrdCOMPLEX*16 for pzlatrd.Workspace array of DIMENSION (nb_a).

Output Parameters

(local)aOn exit, if uplo = 'U', the last nb columns have beenreduced to tridiagonal form, with the diagonal elementsoverwriting the diagonal elements of sub(A); the elementsabove the diagonal with the array tau represent theorthogonal/unitary matrix Q as a product of elementaryreflectors;if uplo = 'L', the first nb columns have been reduced totridiagonal form, with the diagonal elements overwriting thediagonal elements of sub(A); the elements below thediagonal with the array tau represent the orthogonal/unitarymatrix Q as a product of elementary reflectors.

(local)dREAL for pslatrd/pclatrdDOUBLE PRECISION for pdlatrd/pzlatrd.Array, DIMENSION LOCc(ja+n-1).

2028


The diagonal elements of the tridiagonal matrix T: d(i) =a(i,i). d is tied to the distributed matrix A.

(local)eREAL for pslatrd/pclatrdDOUBLE PRECISION for pdlatrd/pzlatrd.Array, DIMENSION LOCc(ja+n-1) if uplo = 'U',LOCc(ja+n-2) otherwise.The off-diagonal elements of the tridiagonal matrix T:e(i) = a(i, i + 1) if uplo = 'U',e(i) = a(i + 1, i) if uplo = L.e is tied to the distributed matrix A.

(local)tauREAL for pslatrdDOUBLE PRECISION for pdlatrdCOMPLEX for pclatrdCOMPLEX*16 for pzlatrd.Array, DIMENSION LOCc(ja+n-1). This array contains thescalar factors tau of the elementary reflectors. tau is tiedto the distributed matrix A.

(local)wREAL for pslatrdDOUBLE PRECISION for pdlatrdCOMPLEX for pclatrdCOMPLEX*16 for pzlatrd.Pointer into the local memory to an array of DIMENSION(lld_w, nb_w). This array contains the local pieces of then-by-nb_w matrix w required to update the unreduced partof sub(A).

Application Notes


Q = H(n) H(n-1) . . . H(n-nb+1)


H(i) = I - tau*v*v' ,

where tau is a real/complex scalar, and v is a real/complex vector with v(i:n) = 0 and v(i-1)= 1; v(1:i-1) is stored on exit in A(ia:ia+i-1, ja+i), and tau in tau(ja+i-1).

2029


If uplo = L, the matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(nb)


H(i) = I - tau*v*v' ,

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1)= 1; v(i+2: n) is stored on exit in A(ia+i+1: ia+n-1, ja+i-1), and tau in tau(ja+i-1).

The elements of the vectors v together form the n-by-nb matrix V which is needed, with W, toapply the transformation to the unreduced part of the matrix, using a symmetric/Hermitianrank-2k update of the form:

sub(A) := sub(A) - vw' - wv'.

The contents of a on exit are illustrated by the following examples with

n = 5 and nb = 2:

where d denotes a diagonal element of the reduced matrix, a denotes an element of the originalmatrix that is unchanged, and vi denotes an element of the vector defining H(i).

2030


p?latrsSolves a triangular system of equations with thescale factor set to prevent overflow.

Syntax

call pslatrs(uplo, trans, diag, normin, n, a, ia, ja, desca, x, ix, jx, descx,scale, cnorm, work)

call pdlatrs(uplo, trans, diag, normin, n, a, ia, ja, desca, x, ix, jx, descx,scale, cnorm, work)

call pclatrs(uplo, trans, diag, normin, n, a, ia, ja, desca, x, ix, jx, descx,scale, cnorm, work)

call pzlatrs(uplo, trans, diag, normin, n, a, ia, ja, desca, x, ix, jx, descx,scale, cnorm, work)

Description

This routine solves a triangular system of equations Ax = σb, ATx = σb or AHx = σb, where

σ is a scale factor set to prevent overflow. The description of the routine will be extended inthe future releases.

Input Parameters

CHARACTER*1.uploSpecifies whether the matrix A is upper or lower triangular.= 'U': Upper triangular= 'L': Lower triangular

CHARACTER*1.transSpecifies the operation applied to Ax.= 'N': Solve Ax = s*b (no transpose)= 'T': Solve ATx = s*b (transpose)= 'C': Solve AHx = s*b (conjugate transpose),where s - is a scale factor

CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': Non-unit triangular= 'U': Unit triangular

2031



INTEGER.n

The order of the matrix A. n ≥ 0

REAL for pslatrs/pclatrsaDOUBLE PRECISION for pdlatrs/pzlatrsArray, DIMENSION (lda, n). Contains the triangular matrixA.If uplo = U, the leading n-by-n upper triangular part of thearray a contains the upper triangular matrix, and the strictlylower triangular part of a is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofthe array a contains the lower triangular matrix, and thestrictly upper triangular part of a is not referenced.If diag = 'U', the diagonal elements of a are also notreferenced and are assumed to be 1.


ia, ja


desca

REAL for pslatrs/pclatrsxDOUBLE PRECISION for pdlatrs/pzlatrsArray, DIMENSION (n). On entry, the right hand side b ofthe triangular system.

(global) INTEGER.The row index in the global array xindicating the first row of sub(x).

ix

(global) INTEGER.jxThe column index in the global array x indicating the firstcolumn of sub(x).

(global and local) INTEGER.descxArray, DIMENSION (dlen_). The array descriptor for thedistributed matrix X.

2032


REAL for pslatrs/pclatrscnormDOUBLE PRECISION for pdlatrs/pzlatrs.Array, DIMENSION (n). If normin = 'Y', cnorm is an inputargument and cnorm (j) contains the norm of theoff-diagonal part of the j-th column of A. If trans = 'N',cnorm (j) must be greater than or equal to the infinity-norm,and if trans = 'T' or 'C', cnorm(j) must be greater thanor equal to the 1-norm.

(local).workREAL for pslatrsDOUBLE PRECISION for pdlatrsCOMPLEX for pclatrsCOMPLEX*16 for pzlatrs.Temporary workspace.

Output Parameters

On exit, x is overwritten by the solution vector x.X

REAL for pslatrs/pclatrsscaleDOUBLE PRECISION for pdlatrs/pzlatrs.Array, DIMENSION (lda, n). The scaling factor s for thetriangular system as described above.If scale = 0, the matrix A is singular or badly scaled, andthe vector x is an exact or approximate solution to Ax =0.

If normin = 'N', cnorm is an output argument and cnorm(j)returns the 1-norm of the off-diagonal part of the j-thcolumn of A.

cnorm

2033


p?latrzReduces an upper trapezoidal matrix to uppertriangular form by means of orthogonal/unitarytransformations.

Syntax

call pslatrz(m, n, l, a, ia, ja, desca, tau, work)

call pdlatrz(m, n, l, a, ia, ja, desca, tau, work)

call pclatrz(m, n, l, a, ia, ja, desca, tau, work)

call pzlatrz(m, n, l, a, ia, ja, desca, tau, work)

Description

This routine reduces the m-by-n(m ≤ n) real/complex upper trapezoidal matrix sub(A) =[A(ia:ia+m-1, ja:ja+m-1) A(ia:ia+m-1, ja+n-l:ja+n-1)] to upper triangular form bymeans of orthogonal/unitary transformations.

The upper trapezoidal matrix sub(A) is factored as

sub(A) = ( R 0 )*Z,


Input Parameters


of rows of the distributed submatrix sub(A). m ≥ 0.

(global) INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(A). n

≥ 0.

(global) INTEGER.lThe number of columns of the distributed submatrix sub(A)containing the meaningful part of the Householder reflectors.l > 0.

(local)a

2034


REAL for pslatrzDOUBLE PRECISION for pdlatrzCOMPLEX for pclatrzCOMPLEX*16 for pzlatrz.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)). On entry, the localpieces of the m-by-n distributed matrix sub(A), which is tobe factored.




(local)workREAL for pslatrzDOUBLE PRECISION for pdlatrzCOMPLEX for pclatrzCOMPLEX*16 for pzlatrz.Workspace array, DIMENSION (lwork).

lwork ≥ nq0 + max(1, mp0), whereiroff = mod(ia-1, mb_a),icoff = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),numroc, indxg2p, and numroc are ScaLAPACK toolfunctions; myrow, mycol, nprow, and npcol can bedetermined by calling the subroutine blacs_gridinfo.

2035


Output Parameters

On exit, the leading m-by-m upper triangular part of sub(A)contains the upper triangular matrix R, and elements n-l+1to n of the first m rows of sub(A), with the array tau,represent the orthogonal/unitary matrix Z as a product ofm elementary reflectors.

a

(local)tauREAL for pslatrzDOUBLE PRECISION for pdlatrzCOMPLEX for pclatrzCOMPLEX*16 for pzlatrz.Array, DIMENSION(LOCr(ja+m-1)). This array contains thescalar factors of the elementary reflectors. tau is tied to thedistributed matrix A.

Application Notes

The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k),which is used (or, in case of complex routines, whose conjugate transpose is used) to introducezeros into the (m - k + 1)-th row of sub(A), is given in the form

where

2036


tau is a scalar and z( k ) is an (n-m)-element vector. tau and z( k ) are chosen to annihilatethe elements of the k-th row of sub(A). The scalar tau is returned in the k-th element of tauand the vector u( k ) in the k-th row of sub(A), such that the elements of z(k ) are in a(k, m + 1 ), ..., a( k, n ). The elements of R are returned in the upper triangular partof sub(A).

Z is given by

Z = Z( 1 ) Z( 2 ) ... Z( m ).

p?lauu2Computes the product U*UH or LH*L, where U andL are upper or lower triangular matrices (localunblocked algorithm).

Syntax

call pslauu2(uplo, n, a, ia, ja, desca)

call pdlauu2(uplo, n, a, ia, ja, desca)

call pclauu2(uplo, n, a, ia, ja, desca)

call pzlauu2(uplo, n, a, ia, ja, desca)

Description

This routine computes the product U*U' or L'*L, where the triangular factor U or L is stored inthe upper or lower triangular part of the distributed matrix

sub(A)= A(ia:ia+n-1, ja:ja+n-1).

If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor Uin sub(A).

If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor Lin sub(A).

This is the unblocked form of the algorithm, calling BLAS Level 2 Routines. No communicationis performed by this routine, the matrix to operate on should be strictly local to one process.

Input Parameters

(global) CHARACTER*1.uplo

2037


Specifies whether the triangular factor stored in the matrixsub(A) is upper or lower triangular:= U: upper triangular= L: lower triangular.


is, the order of the triangular factor U or L. n ≥ 0.

(local)aREAL for pslauu2DOUBLE PRECISION for pdlauu2COMPLEX for pclauu2COMPLEX*16 for pzlauu2.Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1). On entry, the localpieces of the triangular factor U or L.




desca

Output Parameters

(local)aOn exit, if uplo = 'U', the upper triangle of the distributedmatrix sub(A) is overwritten with the upper triangle of theproduct U*U'; if uplo = 'L', the lower triangle of sub(A)is overwritten with the lower triangle of the product L'*L.

2038


p?lauumComputes the product U*UH or LH*L, where U andL are upper or lower triangular matrices.

Syntax

call pslauum(uplo, n, a, ia, ja, desca)

call pdlauum(uplo, n, a, ia, ja, desca)

call pclauum(uplo, n, a, ia, ja, desca)

call pzlauum(uplo, n, a, ia, ja, desca)

Description

This routine computes the product U*U' or L'*L, where the triangular factor U or L is stored inthe upper or lower triangular part of the matrix sub(A)= A(ia:ia+n-1, ja:ja+n-1).

If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor Uin sub(A). If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting thefactor L in sub(A).

This is the blocked form of the algorithm, calling Level 3 PBLAS.

Input Parameters

(global) CHARACTER*1.uploSpecifies whether the triangular factor stored in the matrixsub(A) is upper or lower triangular:= 'U': upper triangular= 'L': lower triangular.


is, the order of the triangular factor U or L. n ≥ 0.

(local)aREAL for pslauumDOUBLE PRECISION for pdlauumCOMPLEX for pclauumCOMPLEX*16 for pzlauum.

2039


Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1). On entry, the localpieces of the triangular factor U or L.




desca

Output Parameters

(local)aOn exit, if uplo = 'U', the upper triangle of the distributedmatrix sub(A) is overwritten with the upper triangle of theproduct U*U' ; if uplo = 'L', the lower triangle of sub(A)is overwritten with the lower triangle of the product L'*L.

p?lawilForms the Wilkinson transform.

Syntax

call pslawil(ii, jj, m, a, desca, h44, h33, h43h34, v)

call pdlawil(ii, jj, m, a, desca, h44, h33, h43h34, v)

Description

This routine gets the transform given by h44, h33, and h43h34 into v starting at row m.

Input Parameters

(global) INTEGER.iiRow owner of h(m+2, m+2).

(global) INTEGER.jjColumn owner of h(m+2, m+2).

2040


(global) INTEGER.mOn entry, the location from where the transform starts (rowm). Unchanged on exit.

(global)aREAL for pslawilDOUBLE PRECISION for pdlawilArray, DIMENSION (desca(lld_),*).On entry, the Hessenberg matrix. Unchanged on exit.

(global and local) INTEGERdescaArray of DIMENSION (dlen_). The array descriptor for thedistributed matrix A. Unchanged on exit.

(global)REAL for pslawil

h43h34 DOUBLE PRECISION for pdlawilThese three values are for the double shift QR iteration.Unchanged on exit.

Output Parameters

(global)vREAL for pslawilDOUBLE PRECISION for pdlawilArray of size 3 that contains the transform on output.

p?org2l/p?ung2lGenerates all or part of the orthogonal/unitarymatrix Q from a QL factorization determined byp?geqlf (unblocked algorithm).

Syntax

call psorg2l(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorg2l(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pcung2l(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzung2l(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

2041


Description

The routine p?org2l/p?ung2l generates an m-by-n real/complex distributed matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the last n columnsof a product of k elementary reflectors of order m:

Q = H(k) . . . H(2) H(1) as returned by p?geqlf.

Input Parameters


of rows of the distributed submatrix Q. m ≥ 0.


number of columns of the distributed submatrix Q. m ≥ n

≥ 0.

(global) INTEGER.kThe number of elementary reflectors whose product defines

the matrix Q. n≥ k ≥ 0.

REAL for psorg2laDOUBLE PRECISION for pdorg2lCOMPLEX for pcung2lCOMPLEX*16 for pzung2l.Pointer into the local memory to an array, DIMENSION(lld_a, LOCc(ja+n-1).On entry, the j-th column must contain the vector that

defines the elementary reflector H(j), ja+n-k ≤ j ≤ja+n-k, as returned by p?geqlf in the k columns of itsdistributed matrix argument A(ia:*,ja+n-k:ja+n-1).



2042



desca

(local)tauREAL for psorg2lDOUBLE PRECISION for pdorg2lCOMPLEX for pcung2lCOMPLEX*16 for pzung2l.Array, DIMENSION LOCc(ja+n-1).This array contains the scalar factor tau(j) of theelementary reflector H(j), as returned by p?geqlf.

(local)workREAL for psorg2lDOUBLE PRECISION for pdorg2lCOMPLEX for pcung2lCOMPLEX*16 for pzung2l.Workspace array, DIMENSION (lwork).


lwork is local input and must be at least lwork ≥ mpa0 +max(1, nqa0), whereiroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),mpa0 = numroc(m+iroffa, mb_a, myrow, iarow,nprow),nqa0 = numroc(n+icoffa, nb_a, mycol, iacol,npcol).indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

2043


Output Parameters

On exit, this array contains the local pieces of the m-by-ndistributed matrix Q.

a


(local) INTEGER.info= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100 +j),if the i-th argument is a scalar and had an illegal value,then info = -i.

p?org2r/p?ung2rGenerates all or part of the orthogonal/unitarymatrix Q from a QR factorization determined byp?geqrf (unblocked algorithm).

Syntax

call psorg2r(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorg2r(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pcung2r(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzung2r(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine p?org2r/p?ung2r generates an m-by-n real/complex matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the first n columnsof a product of k elementary reflectors of order m:

Q = H(1) H(2) . . . H(k)


Input Parameters

(global) INTEGER.m

2044

The number of rows to be operated on, that is, the number

of rows of the distributed submatrix Q.m ≥ 0.


number of columns of the distributed submatrix Q. m ≥ n

≥ 0.


the matrix Q. n ≥ k ≥ 0.

REAL for psorg2raDOUBLE PRECISION for pdorg2rCOMPLEX for pcung2rCOMPLEX*16 for pzung2r.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+n-1).On entry, the j-th column must contain the vector that

defines the elementary reflector H(j), ja ≤ j ≤ ja+k-1,as returned by p?geqrf in the k columns of its distributedmatrix argument A(ia:*,ja:ja+k-1).




(local)tauREAL for psorg2rDOUBLE PRECISION for pdorg2rCOMPLEX for pcung2rCOMPLEX*16 for pzung2r.Array, DIMENSION LOCc(ja+k-1).

2045


This array contains the scalar factor tau(j) of theelementary reflector H(j), as returned by p?geqrf. Thisarray is tied to the distributed matrix A.

(local)workREAL for psorg2rDOUBLE PRECISION for pdorg2rCOMPLEX for pcung2rCOMPLEX*16 for pzung2r.Workspace array, DIMENSION (lwork).


lwork is local input and must be at least lwork ≥ mpa0 +max(1, nqa0),whereiroffa = mod(ia-1, mb_a , icoffa = mod(ja-1,nb_a),iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),mpa0 = numroc(m+iroffa, mb_a, myrow, iarow,nprow),nqa0 = numroc(n+icoffa, nb_a, mycol, iacol,npcol).indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


(local) INTEGER.info

2046


= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

p?orgl2/p?ungl2Generates all or part of the orthogonal/unitarymatrix Q from an LQ factorization determined byp?gelqf (unblocked algorithm).

Syntax

call psorgl2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorgl2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pcungl2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzungl2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine p?orgl2/p?ungl2 generates a m-by-n real/complex matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of aproduct of k elementary reflectors of order n

Q = H(k) . . . H(2) H(1) (for real flavors),

Q = H(k)' . . . H(2)' H(1)' (for complex flavors)


Input Parameters



(global) INTEGER.n

2047

The number of columns to be operated on, that is, the

number of columns of the distributed submatrix Q. n ≥ m

≥ 0.



REAL for psorgl2aDOUBLE PRECISION for pdorgl2COMPLEX for pcungl2COMPLEX*16 for pzungl2.Pointer into the local memory to an array, DIMENSION(lld_a, LOCc(ja+n-1).On entry, the i-th row must contain the vector that defines

the elementary reflector H(i), ia ≤ i ≤ ia+k-1, asreturned by p?gelqf in the k rows of its distributed matrixargument A(ia:ia+k-1, ja:*).




desca

(local)tauREAL for psorgl2DOUBLE PRECISION for pdorgl2COMPLEX for pcungl2COMPLEX*16 for pzungl2.Array, DIMENSION LOCr(ja+k-1). This array contains thescalar factors tau(i) of the elementary reflectors H(i), asreturned by p?gelqf. This array is tied to the distributedmatrix A.

(local)WORKREAL for psorgl2

2048


DOUBLE PRECISION for pdorgl2COMPLEX for pcungl2COMPLEX*16 for pzungl2.Workspace array, DIMENSION (lwork).


lwork is local input and must be at least lwork ≥ nqa0 +max(1, mpa0), whereiroffa = mod(ia-1, mb_a),icoffa = mod(ja-1, nb_a),iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),mpa0 = numroc(m+iroffa, mb_a, myrow, iarow,nprow),nqa0 = numroc(n+icoffa, nb_a, mycol, iacol,npcol).indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a


(local) INTEGER.info= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

2049

p?orgr2/p?ungr2Generates all or part of the orthogonal/unitarymatrix Q from an RQ factorization determined byp?gerqf (unblocked algorithm).

Syntax

call psorgr2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pdorgr2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pcungr2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

call pzungr2(m, n, k, a, ia, ja, desca, tau, work, lwork, info)

Description

The routine p?orgr2/p?ungr2 generates an m-by-n real/complex matrix Q denotingA(ia:ia+m-1, ja:ja+n-1) with orthonormal rows, which is defined as the last m rows of aproduct of k elementary reflectors of order n

Q = H(1) H(2) . . . H(k) (for real flavors)

Q = H(1)' H(2)' . . . H(k)' (for complex flavors)


Input Parameters




number of columns of the distributed submatrix Q. n ≥ m

≥ 0.



REAL for psorgr2aDOUBLE PRECISION for pdorgr2

2050


COMPLEX for pcungr2COMPLEX*16 for pzungr2.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+n-1).On entry, the i-th row must contain the vector that defines

the elementary reflector H(i), ia+m-k ≤ i ≤ ia+m-1,as returned by p?gerqf in the k rows of its distributedmatrix argument A(ia+m-k:ia+m-1, ja:*).




desca

(local)tauREAL for psorgr2DOUBLE PRECISION for pdorgr2COMPLEX for pcungr2COMPLEX*16 for pzungr2.Array, DIMENSION LOCr(ja+m-1). This array contains thescalar factors tau(i) of the elementary reflectors H(i), asreturned by p?gerqf. This array is tied to the distributedmatrix A.

(local)workREAL for psorgr2DOUBLE PRECISION for pdorgr2COMPLEX for pcungr2COMPLEX*16 for pzungr2.Workspace array, DIMENSION (lwork).


lwork is local input and must be at least lwork ≥ nqa0 +max(1, mpa0 ), where iroffa = mod( ia-1, mb_a ),icoffa = mod( ja-1, nb_a ),

2051


iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow),iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol),mpa0 = numroc( m+iroffa, mb_a, myrow, iarow,nprow ),nqa0 = numroc( n+icoffa, nb_a, mycol, iacol,npcol ).indxg2p and numroc are ScaLAPACK tool functions; myrow,mycol, nprow, and npcol can be determined by calling thesubroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters


a



2052


p?orm2l/p?unm2lMultiplies a general matrix by theorthogonal/unitary matrix from a QL factorizationdetermined by p?geqlf (unblocked algorithm).

Syntax

call psorm2l(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdorm2l(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pcunm2l(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunm2l(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine p?orm2l/p?unm2l overwrites the general real/complex m-by-n distributed matrixsub (C)=C(ic:ic+m-1,jc:jc+n-1) with

side = 'R'side = 'L'

sub (C)*QQ*sub (C)trans = 'N'

sub(C)*QTQT * sub(C)trans = 'T' (for real flavors)

sub(C)*QHQH * sub(C)trans = 'C' (for complex flavors)


Q = H(k). . . H(2) H(1)

as returned by p?geqlf . Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

(global) CHARACTER.side

2053


= 'L': apply Q or QT for real flavors (QH for complex flavors)from the left,= 'R': apply Q or QT for real flavors (QH for complex flavors)from the right.

(global) CHARACTER.trans= 'N': apply Q (no transpose)= 'T': apply QT (transpose, for real flavors)= 'C': apply QH (conjugate transpose, for complex flavors)


of rows of the distributed submatrix sub(C). m ≥ 0.

(global) INTEGER.nThe number of columns to be operated on, that is, thenumber of columns of the distributed submatrix sub(C). n

≥ 0.

(global) INTEGER.kThe number of elementary reflectors whose product definesthe matrix Q.

If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

(local)aREAL for psorm2lDOUBLE PRECISION for pdorm2lCOMPLEX for pcunm2lCOMPLEX*16 for pzunm2l.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+k-1).On entry, the j-th row must contain the vector that defines

the elementary reflector H(j), ja ≤ j ≤ ja+k-1, asreturned by p?geqlf in the k columns of its distributedmatrix argument A(ia:*,ja:ja+k-1). The argumentA(ia:*,ja:ja+k-1) is modified by the routine but restoredon exit.

If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),

if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

2054





desca

(local)tauREAL for psorm2lDOUBLE PRECISION for pdorm2lCOMPLEX for pcunm2lCOMPLEX*16 for pzunm2l.Array, DIMENSIONLOCc(ja+n-1). This array contains thescalar factor tau(j) of the elementary reflector H(j), asreturned by p?geqlf. This array is tied to the distributedmatrix A.

(local)cREAL for psorm2lDOUBLE PRECISION for pdorm2lCOMPLEX for pcunm2lCOMPLEX*16 for pzunm2l.Pointer into the local memory to an array,DIMENSION(lld_c, LOCc(jc+n-1)).On entry, the localpieces of the distributed matrix sub (C).

(global) INTEGER.icThe row index in the global array C indicating the first rowof sub(C).

(global) INTEGER.jcThe column index in the global array C indicating the firstcolumn of sub(C).

(global and local) INTEGER array of DIMENSION (dlen_).The array descriptor for the distributed matrix C.

descc

(local)workREAL for psorm2lDOUBLE PRECISION for pdorm2l

2055


COMPLEX for pcunm2lCOMPLEX*16 for pzunm2l.Workspace array, DIMENSION (lwork).On exit, work(1) returns the minimal and optimal lwork.

(local or global) INTEGER.lworkThe dimension of the array work.lwork is local input and must be at least

if side = 'L', lwork ≥ mpc0 + max(1, nqc0),

if side = 'R', lwork ≥ nqc0 + max(max(1, mpc0),numroc(numroc(n+icoffc, nb_a, 0, 0, npcol),nb_a, 0, 0, lcmq)),wherelcmq = lcm/npcol,lcm = iclm(nprow, npcol),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow,nprow),Npc0 = numroc(n+iroffc, mb_c, myrow, iccol,npcol),ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, sub(C) is overwritten by Q*sub(C), or Q'*sub(C),or sub(C)*Q', or sub(C)*Q.

c


(local) INTEGER.info

2056


= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.

NOTE. The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1)must verify some alignment properties, namely the following expressions should be true:

If side = 'L', ( mb_a.eq.mb_c .AND. iroffa.eq.iroffc .AND. iarow.eq.icrow)

If side = 'R', ( mb_a.eq.nb_c .AND. iroffa.eq.iroffc ).

p?orm2r/p?unm2rMultiplies a general matrix by theorthogonal/unitary matrix from a QR factorizationdetermined by p?geqrf (unblocked algorithm).

Syntax

call psorm2r(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdorm2r(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pcunm2r(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunm2r(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine p?orm2r/p?unm2r overwrites the general real/complex m-by-n distributed matrixsub (C)=C(ic:ic+m-1, jc:jc+n-1) with


2057





Q = H(k). . . H(2) H(1)

as returned by p?geqrf . Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

(global) CHARACTER.side= 'L': apply Q or QT for real flavors (QH for complex flavors)from the left,= 'R': apply Q or QT for real flavors (QH for complex flavors)from the right.





≥ 0.


If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

(local)a

2058


REAL for psorm2rDOUBLE PRECISION for pdorm2rCOMPLEX for pcunm2rCOMPLEX*16 for pzunm2r.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+k-1).On entry, the j-th column must contain the vector that

defines the elementary reflector H(j), ja ≤ j ≤ ja+k-1,as returned by p?geqrf in the k columns of its distributedmatrix argument A(ia:*,ja:ja+k-1). The argumentA(ia:*,ja:ja+k-1) is modified by the routine but restoredon exit.

If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),

if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).




desca

(local)tauREAL for psorm2rDOUBLE PRECISION for pdorm2rCOMPLEX for pcunm2rCOMPLEX*16 for pzunm2r.Array, DIMENSION LOCc(ja+k-1). This array contains thescalar factors tau(j) of the elementary reflector H(j), asreturned by p?geqrf. This array is tied to the distributedmatrix A.

(local)cREAL for psorm2rDOUBLE PRECISION for pdorm2rCOMPLEX for pcunm2rCOMPLEX*16 for pzunm2r.

2059


Pointer into the local memory to an array,DIMENSION(lld_c, LOCc(jc+n-1)).On entry, the local pieces of the distributed matrix sub (C).



(global and local) INTEGER array of DIMENSION (dlen_).desccThe array descriptor for the distributed matrix C.

(local)workREAL for psorm2rDOUBLE PRECISION for pdorm2rCOMPLEX for pcunm2rCOMPLEX*16 for pzunm2r.Workspace array, DIMENSION (lwork).


if side = 'L', lwork ≥ mpc0 + max(1, nqc0),

if side = 'R', lwork ≥ nqc0 + max(max(1, mpc0),numroc(numroc(n+icoffc, nb_a, 0, 0, npcol),nb_a, 0, 0, lcmq)),wherelcmq = lcm/npcol ,lcm = iclm(nprow, npcol),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow,nprow),Npc0 = numroc(n+iroffc, mb_c, myrow, iccol,npcol),

2060


ilcm, indxg2p and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, sub(C) is overwritten by Q*sub(C), or Q'*sub(C),or sub(C)*Q', or sub(C)*Q.

c



NOTE. The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1)must verify some alignment properties, namely the following expressions should be true:

If side = 'L', (mb_a.eq.mb_c .AND. iroffa.eq.iroffc .AND. iarow.eq.icrow)

If side = 'R', (mb_a.eq.nb_c .AND. iroffa.eq.iroffc).

2061


p?orml2/p?unml2Multiplies a general matrix by theorthogonal/unitary matrix from an LQ factorizationdetermined by p?gelqf (unblocked algorithm).

Syntax

call psorml2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdorml2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pcunml2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunml2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine p?orml2/p?unml2 overwrites the general real/complex m-by-n distributed matrixsub (C)=C(ic:ic+m-1, jc:jc+n-1) with





where Q is a real orthogonal or complex unitary distributed matrix defined as the product of kelementary reflectors

Q = H(k). . . H(2) H(1) (for real flavors)

Q = H(k)' . . . H(2)' H(1)' (for complex flavors)

as returned by p?gelqf . Q is of order m if side = 'L' and of order n if side = 'R'.

2062


Input Parameters






≥ 0.


If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

(local)aREAL for psorml2DOUBLE PRECISION for pdorml2COMPLEX for pcunml2COMPLEX*16 for pzunml2.Pointer into the local memory to an array, DIMENSION(lld_a, LOCc(ja+m-1) if side='L',(lld_a, LOCc(ja+n-1) if side='R',

where lld_a ≥ max (1, LOCr(ia+k-1)).On entry, the i-th row must contain the vector that defines

the elementary reflector H(i), ia ≤ i ≤ ia+k-1, asreturned by p?gelqf in the k rows of its distributed matrix

2063


argument A(ia:ia+k-1, ja:*). The argumentA(ia:ia+k-1, ja:*) is modified by the routine butrestored on exit.




desca

(local)tauREAL for psorml2DOUBLE PRECISION for pdorml2COMPLEX for pcunml2COMPLEX*16 for pzunml2.Array, DIMENSION LOCc(ia+k-1). This array contains thescalar factors tau(i) of the elementary reflector H(i), asreturned by p?gelqf. This array is tied to the distributedmatrix A.

(local)cREAL for psorml2DOUBLE PRECISION for pdorml2COMPLEX for pcunml2COMPLEX*16 for pzunml2.Pointer into the local memory to an array,DIMENSION(lld_c, LOCc(jc+n-1)). On entry, the localpieces of the distributed matrix sub (C).




descc

2064


(local)workREAL for psorml2DOUBLE PRECISION for pdorml2COMPLEX for pcunml2COMPLEX*16 for pzunml2.Workspace array, DIMENSION (lwork).


if side = 'L', lwork ≥ mqc0 + max(max( 1, npc0),numroc(numroc(m+icoffc, mb_a, 0, 0, nprow),mb_a, 0, 0, lcmp)),

if side = 'R', lwork ≥ npc0 + max(1, mqc0),wherelcmp = lcm / nprow,lcm = iclm(nprow, npcol),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),Mpc0 = numroc(m+icoffc, mb_c, mycol, icrow,nprow),Nqc0 = numroc(n+iroffc, nb_c, myrow, iccol,npcol),ilcm, indxg2p and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, sub(C) is overwritten by Q*sub(C) ,or Q'*sub(C),or sub(C)*Q', or sub(C)*Q.

c


2065



NOTE. The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1)must verify some alignment properties, namely the following expressions should be true:

If side = 'L', (nb_a.eq.mb_c .AND. icoffa.eq.iroffc)

If side = 'R', (nb_a.eq.nb_c .AND. icoffa.eq.icoffc .AND. iacol.eq.iccol).

p?ormr2/p?unmr2Multiplies a general matrix by theorthogonal/unitary matrix from an RQ factorizationdetermined by p?gerqf (unblocked algorithm).

Syntax

call psormr2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pdormr2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pcunmr2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

call pzunmr2(side, trans, m, n, k, a, ia, ja, desca, tau, c, ic, jc, descc,work, lwork, info)

Description

The routine p?ormr2/p?unmr2 overwrites the general real/complex m-by-n distributed matrixsub (C)=C(ic:ic+m-1, jc:jc+n-1) with


2066





where Q is a real orthogonal or complex unitary distributed matrix defined as the product of kelementary reflectors

Q = H(1) H(2). . . H(k) (for real flavors)

Q = H(1)' H(2)' . . . H(k)' (for complex flavors)

as returned by p?gerqf . Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters






≥ 0.


If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

2067


(local)aREAL for psormr2DOUBLE PRECISION for pdormr2COMPLEX for pcunmr2COMPLEX*16 for pzunmr2.Pointer into the local memory to an array, DIMENSION(lld_a, LOCc(ja+m-1) if side='L',(lld_a, LOCc(ja+n-1) if side='R',

where lld_a ≥ max (1, LOCr(ia+k-1)).On entry, the i-th row must contain the vector that defines

the elementary reflector H(i), ia ≤ i ≤ ia+k-1, asreturned by p?gerqf in the k rows of its distributed matrixargument A(ia:ia+k-1, ja:*).The argument A(ia:ia+k-1, ja:*) is modified by theroutine but restored on exit.




desca

(local)tauREAL for psormr2DOUBLE PRECISION for pdormr2COMPLEX for pcunmr2COMPLEX*16 for pzunmr2.Array, DIMENSION LOCc(ia+k-1). This array contains thescalar factors tau(i) of the elementary reflector H(i), asreturned by p?gerqf. This array is tied to the distributedmatrix A.

(local)cREAL for psormr2DOUBLE PRECISION for pdormr2COMPLEX for pcunmr2

2068


COMPLEX*16 for pzunmr2.Pointer into the local memory to an array,DIMENSION(lld_c, LOCc(jc+n-1)). On entry, the localpieces of the distributed matrix sub (C).




descc

(local)workREAL for psormr2DOUBLE PRECISION for pdormr2COMPLEX for pcunmr2COMPLEX*16 for pzunmr2.Workspace array, DIMENSION (lwork).


if side = 'L', lwork ≥ mpc0 + max(max(1, nqc0),numroc(numroc(m+iroffc, mb_a, 0, 0, nprow),mb_a, 0, 0, lcmp)),

if side = 'R', lwork ≥ nqc0 + max(1, mpc0),where lcmp = lcm/nprow,lcm = iclm(nprow, npcol),iroffc = mod(ic-1, mb_c),icoffc = mod(jc-1, nb_c),icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),Mpc0 = numroc(m+iroffc, mb_c, myrow, icrow,nprow),Nqc0 = numroc(n+icoffc, nb_c, mycol, iccol,npcol),

2069


ilcm, indxg2p and numroc are ScaLAPACK tool functions;myrow, mycol, nprow, and npcol can be determined bycalling the subroutine blacs_gridinfo.If lwork = -1, then lwork is global input and a workspacequery is assumed; the routine only calculates the minimumand optimal size for all work arrays. Each of these valuesis returned in the first entry of the corresponding work array,and no error message is issued by pxerbla.

Output Parameters

On exit, sub(C) is overwritten by Q*sub(C), or Q' *sub(C),or sub(C)*Q', or sub(C)*Q.

c



NOTE. The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1)must verify some alignment properties, namely the following expressions should be true:

If side = 'L', ( nb_a.eq.mb_c .AND. icoffa.eq.iroffc )

If side = 'R', ( nb_a.eq.nb_c .AND. icoffa.eq.icoffc .AND. iacol.eq.iccol).

2070


p?pbtrsvSolves a single triangular linear system viafrontsolve or backsolve where the triangular matrixis a factor of a banded matrix computed byp?pbtrf.

Syntax

call pspbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pdpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pcpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pzpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf,work, lwork, info)

Description

The routine p?pbtrsv solves a banded triangular system of linear equations

A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)

or

A(1:n, ja:ja+n-1)T *X = B(jb:jb+n-1, 1:nrhs) for real flavors,

A(1:n, ja:ja+n-1)H *X = B(jb:jb+n-1, 1:nrhs) for complex flavors,

where A(1:n, ja:ja+n-1) is a banded triangular matrix factor produced by the Choleskyfactorization code p?pbtrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored inA(1:n, ja:ja+n-1) is either upper or lower triangular according to uplo, and the choice ofsolving A(1:n, ja:ja+n-1) or A(1:n, ja:ja+n-1)T for real flavors and A(1:n, ja:ja+n-1)H

for complex flavors respectively is dictated by the user by the parameter trans.

Routine p?pbtrf must be called first.

Input Parameters

(global) CHARACTER. Must be 'U' or 'L'.uploIf uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) isstored;

2071


If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) isstored.

(global) CHARACTER. Must be 'N' or 'T' or 'C'.transIf trans = 'N', solve with A(1:n, ja:ja+n-1);If trans = 'T' or 'C' for real flavors, solve with A(1:n,ja:ja+n-1)T.If trans = 'C' for complex flavors, solve with conjugatetranspose(A(1:n, ja:ja+n-1)H.

(global) INTEGER.nThe number of rows and columns to be operated on, thatis, the order of the distributed submatrix A(1:n,

ja:ja+n-1). n ≥ 0.

(global) INTEGER.bw

The number of subdiagonals in 'L' or 'U', 0 ≤ bw ≤ n-1.

(global) INTEGER.nrhsThe number of right hand sides; the number of columns ofthe distributed submatrix B(jb:jb+n-1, 1:nrhs); nrhs

≥ 0.

(local)aREAL for pspbtrsvDOUBLE PRECISION for pdpbtrsvCOMPLEX for pcpbtrsvCOMPLEX*16 for pzpbtrsv.Pointer into the local memory to an array with the first

DIMENSION lld_a ≥(bw+1), stored in desca.On entry, this array contains the local pieces of the n-by-nsymmetric banded distributed Cholesky factor L orLT*A(1:n, ja:ja+n-1).This local portion is stored in the packed banded formatused in LAPACK. Please see the Application Notes below andthe ScaLAPACK manual for more detail on the format ofdistributed matrices.


ja

2072



desca

If 1D type (dtype_a = 501), then dlen ≥ 7;

If 2D type (dtype_a = 1), then dlen ≥ 9.Contains information on mapping of A to memory. Please,see ScaLAPACK manual for full description and options.

(local)bREAL for pspbtrsvDOUBLE PRECISION for pdpbtrsvCOMPLEX for pcpbtrsvCOMPLEX*16 for pzpbtrsv.Pointer into the local memory to an array of local lead

DIMENSION lld_b ≥ nb.On entry, this array contains the local pieces of the righthand sides B(jb:jb+n-1, 1:nrhs).


ib


descb

If 1D type (dtype_b = 502), then dlen≥ 7;

If 2D type (dtype_b = 1), then dlen≥ 9.Contains information on mapping of B to memory. Please,see ScaLAPACK manual for full description and options.

(local)lafINTEGER. The size of user-input auxiliary Fillin space af.

Must be laf ≥ (nb+2*bw)*bw . If laf is not large enough,an error code will be returned and the minimum acceptablesize will be returned in af(1).

(local)workREAL for pspbtrsvDOUBLE PRECISION for pdpbtrsvCOMPLEX for pcpbtrsvCOMPLEX*16 for pzpbtrsv.

2073


The array work is a temporary workspace array ofDIMENSION lwork. This space may be overwritten inbetween calls to routines.

(local or global) INTEGER. The size of the user-input

workspace work, must be at least lwork ≥ bw*nrhs. Iflwork is too small, the minimal acceptable size will bereturned in work(1) and an error code is returned.

lwork

Output Parameters

(local)afREAL for pspbtrsvDOUBLE PRECISION for pdpbtrsvCOMPLEX for pcpbtrsvCOMPLEX*16 for pzpbtrsv.The array af is of DIMENSION laf. It contains auxiliary Fillinspace. Fillin is created during the factorization routinep?pbtrf and this is stored in af. If a linear system is to besolved using p?pbtrs after the factorization routine, afmust not be altered after the factorization.

On exit, this array contains the local piece of the solutionsdistributed matrix X.

b

On exit, work(1) contains the minimum value of lwork.work(1)


Application Notes

If the factorization routine and the solve routine are to be called separately to solve varioussets of right-hand sides using the same coefficient matrix, the auxiliary space af must not bealtered between calls to the factorization routine and the solve routine.

2074


The best algorithm for solving banded and tridiagonal linear systems depends on a variety ofparameters, especially the bandwidth. Currently, only algorithms designed for the case N/P>> bw are implemented. These algorithms go by many names, including Divide and Conquer,Partitioning, domain decomposition-type, etc.

The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with thenumber of equations. In this situation, it is best to distribute the input matrix Aone-dimensionally, with columns atomic and rows divided amongst the processes. The basicalgorithm divides the banded matrix up into P pieces with one stored on each processor, andthen proceeds in 2 phases for the factorization or 3 for the solution of a linear system.

1. Local Phase: The individual pieces are factored independently and in parallel. These factorsare applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliaryspace af. Mathematically, this is equivalent to reordering the matrix A as PAPT and thenfactoring the principal leading submatrix of size equal to the sum of the sizes of the matricesfactored on each processor. The factors of these submatrices overwrite the correspondingparts of A in memory.

2. Reduced System Phase: A small (bw*(P-1)) system is formed representing interactionof the larger blocks and is stored (as are its factors) in the space af. A parallel Block CyclicReduction algorithm is used. For a linear system, a parallel front solve followed by ananalogous backsolve, both using the structure of the factored matrix, are performed.

3. Backsubsitution Phase: For a linear system, a local backsubstitution is performed on eachprocessor in parallel.

2075


p?pttrsvSolves a single triangular linear system viafrontsolve or backsolve where the triangular matrixis a factor of a tridiagonal matrix computed byp?pttrf .

Syntax

call pspttrsv(uplo, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pdpttrsv(uplo, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf, work,lwork, info)

call pcpttrsv(uplo, trans, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf,work, lwork, info)

call pzpttrsv(uplo, trans, n, nrhs, d, e, ja, desca, b, ib, descb, af, laf,work, lwork, info)

Description

This routine solves a tridiagonal triangular system of linear equations

A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)

or

A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,

A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,

where A(1:n, ja:ja+n-1) is a tridiagonal triangular matrix factor produced by the Choleskyfactorization code p?pttrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored inA(1:n, ja:ja+n-1) is either upper or lower triangular according to uplo, and the choice ofsolving A(1:n, ja:ja+n-1) or A(1:n, ja:ja+n-1)T for real flavors and A(1:n, ja:ja+n-1)H

for complex flavors respectively is dictated by the user by the parameter trans.

Routine p?pttrf must be called first.

Input Parameters

(global) CHARACTER. Must be 'U' or 'L'.uploIf uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) isstored;

2076


If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) isstored.

(global) CHARACTER. Must be 'N' or 'C'.transIf trans = 'N', solve with A(1:n, ja:ja+n-1);If trans = 'C' (for complex flavors), solve with conjugatetranspose (A(1:n, ja:ja+n-1))H.

(global) INTEGER.nThe number of rows and columns to be operated on, thatis, the order of the distributed submatrix A(1:n,

ja:ja+n-1). n ≥ 0.

(global) INTEGER.nrhsThe number of right hand sides; the number of columns of

the distributed submatrix B(jb:jb+n-1, 1:nrhs); nrhs ≥0.

(local)dREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.Pointer to the local part of the global vector storing the main

diagonal of the matrix; must be of size ≥ desca(nb_).

(local)eREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.Pointer to the local part of the global vector storing the

upper diagonal of the matrix; must be of size ≥ desca(nb_).Globally, du(n) is not referenced, and du must be alignedwith d.


ja


desca

2077


If 1D type (dtype_a = 501 or 502), then dlen ≥ 7;

If 2D type (dtype_a = 1), then dlen ≥ 9.Contains information on mapping of A to memory. Please,see ScaLAPACK manual for full description and options.

(local)bREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.Pointer into the local memory to an array of local lead

DIMENSION lld_b ≥ nb.On entry, this array contains the local pieces of the righthand sides B(jb:jb+n-1, 1:nrhs).


ib


descb

If 1D type (dtype_b = 502), then dlen ≥ 7;

If 2D type (dtype_b = 1), then dlen ≥ 9.Contains information on mapping of B to memory. Please,see ScaLAPACK manual for full description and options.

(local)lafINTEGER. The size of user-input auxiliary Fillin space af.

Must be laf ≥ (nb+2*bw)*bw.If laf is not large enough, an error code will be returnedand the minimum acceptable size will be returned in af(1).

(local)workREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.The array work is a temporary workspace array ofDIMENSION lwork. This space may be overwritten inbetween calls to routines.

2078


(local or global) INTEGER. The size of the user-input

workspace work, must be at least lwork ≥(10+2*min(100,nrhs))*npcol+4*nrhs. If lwork is too small, the minimalacceptable size will be returned in work(1) and an errorcode is returned.

lwork

Output Parameters

(local).d, eREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.On exit, these arrays contain information on the factors ofthe matrix.

(local)afREAL for pspttrsvDOUBLE PRECISION for pdpttrsvCOMPLEX for pcpttrsvCOMPLEX*16 for pzpttrsv.The array af is of DIMENSION laf. It contains auxiliary Fillinspace. Fillin is created during the factorization routinep?pbtrf and this is stored in af. If a linear system is to besolved using p?pttrs after the factorization routine, afmust not be altered after the factorization.

On exit, this array contains the local piece of the solutionsdistributed matrix X.

b

On exit, work(1) contains the minimum value of lwork.work(1)


2079


p?potf2Computes the Cholesky factorization of asymmetric/Hermitian positive definite matrix (localunblocked algorithm).

Syntax

call pspotf2(uplo, n, a, ia, ja, desca, info)

call pdpotf2(uplo, n, a, ia, ja, desca, info)

call pcpotf2(uplo, n, a, ia, ja, desca, info)

call pzpotf2(uplo, n, a, ia, ja, desca, info)

Description

This routine computes the Cholesky factorization of a real symmetric or complex Hermitianpositive definite distributed matrix sub (A)=A(ia:ia+n-1, ja:ja+n-1).


sub (A) = U' U, if uplo = 'U', or

sub (A) = LL', if uplo = 'L',

where U is an upper triangular matrix and L is lower triangular.

Input Parameters

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix A is stored.= 'U': upper triangle of sub (A) is stored;= 'L': lower triangle of sub (A) is stored.


is, the order of the distributed submatrix sub (A). n ≥ 0.

(local)aREAL for pspotf2DOUBLE PRECISION or pdpotf2COMPLEX for pcpotf2COMPLEX*16 for pzpotf2.

2080


Pointer into the local memory to an array ofDIMENSION(lld_a, LOCc(ja+n-1)) containing the localpieces of the n-by-n symmetric distributed matrix sub(A) tobe factored.If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular matrix and the strictlylower triangular part of this matrix is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular matrix and the strictlyupper triangular part of sub(A) is not referenced.

(global) INTEGER.ia, jaThe row and column indices in the global array A indicatingthe first row and the first column of the sub(A), respectively.


desca

Output Parameters

(local)aOn exit,if uplo = 'U', the upper triangular part of the distributedmatrix contains the Cholesky factor U;if uplo = 'L', the lower triangular part of the distributedmatrix contains the Cholesky factor L.

(local) INTEGER.info= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100+j),if the i-th argument is a scalar and had an illegal value,then info = -i.> 0: if info = k, the leading minor of order k is notpositive definite, and the factorization could not becompleted.

2081

p?rsclMultiplies a vector by the reciprocal of a real scalar.

Syntax

call psrscl(n, sa, sx, ix, jx, descx, incx)

call pdrscl(n, sa, sx, ix, jx, descx, incx)

call pcsrscl(n, sa, sx, ix, jx, descx, incx)

call pzdrscl(n, sa, sx, ix, jx, descx, incx)

Description

This routine multiplies an n-element real/complex vector sub(x) by the real scalar 1/a. This isdone without overflow or underflow as long as the final result sub(x)/a does not overflow orunderflow.

sub(x) denotes x(ix:ix+n-1, jx:jx), if incx = 1,

and x(ix:ix, jx:jx+n-1), if incx = m_x.

Input Parameters

(global) INTEGER.nThe number of components of the distributed vector sub(x).

n ≥ 0.

REAL for psrscl/pcsrsclsaDOUBLE PRECISION for pdrscl/pzdrscl.The scalar a that is used to divide each component of the

vector x. This argument must be ≥ 0, or the subroutine willdivide by zero.

REAL forpsrsclsxDOUBLE PRECISION for pdrsclCOMPLEX for pcsrsclCOMPLEX*16 for pzdrscl.Array containing the local pieces of a distributed matrix ofDIMENSION of at least ((jx-1)*m_x + ix +(n-1)*abs(incx)). This array contains the entries of thedistributed vector sub(x).

2082


(global) INTEGER.The row index of the submatrix of thedistributed matrix X to operate on.

ix

(global) INTEGER.jxThe column index of the submatrix of the distributed matrixX to operate on.

(global and local). INTEGER.descxArray of DIMENSION 8. The array descriptor for thedistributed matrix X.

(global) INTEGER.incxThe increment for the elements of X. This version supportsonly two values of incx, namely 1 and m_x.

Output Parameters

On exit, the result x/a.sx

p?sygs2/p?hegs2Reduces a symmetric/Hermitian definitegeneralized eigenproblem to standard form, usingthe factorization results obtained from p?potrf(local unblocked algorithm).

Syntax

call pssygs2(ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, info)

call pdsygs2(ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, info)

call pchegs2(ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, info)

call pzhegs2(ibtype, uplo, n, a, ia, ja, desca, b, ib, jb, descb, info)

Description

The routine p?sygs2/p?hegs2 reduces a real symmetric-definite or a complex Hermitian-definitegeneralized eigenproblem to standard form.

sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).



2083


and sub(A) is overwritten by

inv(UT)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LT) for real flavors and

inv(UH)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LH) for complex flavors.


sub(A)*sub(B)x = λ*x or sub(B)*sub(A)x =λ*x,

and sub(A) is overwritten

by U*sub(A)*UT or L**T*sub(A)*L for real flavors and

by U*sub(A)*UH or L**H*sub(A)*L for complex flavors.

sub(B) must have been previously factorized as UT*U or L*LT (for real flavors) or as UH*U orL*LH (for complex flavors) by p?potrf.

Input Parameters

(global) INTEGER.ibtype= 1: compute inv(UT)*sub(A)*inv(U) orinv(L)*sub(A)*inv(LT) for real subroutines, andinv(UH)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LH) for complexsubroutines;= 2 or 3: compute U*sub(A)*UT or LT*sub(A)*L for realsubroutines, and by U*sub(A)*UH or LH*sub(A)*L for complexsubroutines.

(global) CHARACTERuploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix sub(A) is stored, and howsub(B) is factorized.= 'U': Upper triangular of sub(A) is stored and sub(B) isfactorized as UT*U (for real subroutines) or as UH*U (forcomplex subroutines).= 'L': Lower triangular of sub(A) is stored and sub(B) isfactorized as L*LT (for real subroutines) or as L*LH (forcomplex subroutines)

(global) INTEGER.n

The order of the matrices sub(A) and sub(B). n ≥ 0.

(local)a

2084


REAL for pssygs2DOUBLE PRECISION for pdsygs2COMPLEX for pchegs2COMPLEX*16 for pzhegs2.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the n-by-nsymmetric/Hermitian distributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andthe strictly lower triangular part of sub(A) is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andthe strictly upper triangular part of sub(A) is not referenced.



desca

(local)BREAL for pssygs2DOUBLE PRECISION for pdsygs2COMPLEX for pchegs2COMPLEX*16 for pzhegs2.Pointer into the local memory to an array,DIMENSION(lld_b, LOCc(jb+n-1)).On entry, this array contains the local pieces of thetriangular factor from the Cholesky factorization of sub(B)as returned by p?potrf.

(global) INTEGER.ib, jbThe row and column indices in the global array B indicatingthe first row and the first column of the sub(B), respectively.


descb

2085


Output Parameters

(local)aOn exit, if info = 0, the transformed matrix is stored inthe same format as sub(A).

INTEGER.info= 0: successful exit.< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100),if the i-th argument is a scalar and had an illegal value,then info = -i.

p?sytd2/p?hetd2Reduces a symmetric/Hermitian matrix to realsymmetric tridiagonal form by anorthogonal/unitary similarity transformation (localunblocked algorithm).

Syntax

call pssytd2(uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

call pdsytd2(uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

call pchetd2(uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

call pzhetd2(uplo, n, a, ia, ja, desca, d, e, tau, work, lwork, info)

Description

The routine p?sytd2/p?hetd2 reduces a real symmetric/complex Hermitian matrix sub(A) tosymmetric/Hermitian tridiagonal form T by an orthogonal/unitary similarity transformation:

Q' sub(A)Q = T, where sub(A) = A(ia:ia+n-1, ja:ja+n-1).

Input Parameters

(global) CHARACTER.uploSpecifies whether the upper or lower triangular part of thesymmetric/Hermitian matrix sub(A) is stored:= 'U': upper triangular

2086

= 'L': lower triangular



(local)aREAL for pssytd2DOUBLE PRECISION for pdsytd2COMPLEX for pchetd2COMPLEX*16 for pzhetd2.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of the n-by-nsymmetric/Hermitian distributed matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofsub(A) contains the upper triangular part of the matrix, andthe strictly lower triangular part of sub(A) is not referenced.If uplo = 'L', the leading n-by-n lower triangular part ofsub(A) contains the lower triangular part of the matrix, andthe strictly upper triangular part of sub(A) is not referenced.



desca

(local)workREAL for pssytd2DOUBLE PRECISION for pdsytd2COMPLEX for pchetd2COMPLEX*16 for pzhetd2.The array work is a temporary workspace array ofDIMENSION lwork.

Output Parameters

On exit, if uplo = 'U', the diagonal and first superdiagonalof sub(A) are overwritten by the corresponding elements ofthe tridiagonal matrix T, and the elements above the first

a

2087


superdiagonal, with the array tau, represent theorthogonal/unitary matrix Q as a product of elementaryreflectors;if uplo = 'L', the diagonal and first subdiagonal of A areoverwritten by the corresponding elements of the tridiagonalmatrix T, and the elements below the first subdiagonal, withthe array tau, represent the orthogonal/unitary matrix Qas a product of elementary reflectors. See the ApplicationNotes below.

(local)dREAL for pssytd2/pchetd2DOUBLE PRECISION for pdsytd2/pzhetd2.Array, DIMENSION(LOCc(ja+n-1)). The diagonal elementsof the tridiagonal matrix T:d(i) = a(i,i); d is tied to the distributed matrix A.

(local)eREAL for pssytd2/pchetd2DOUBLE PRECISION for pdsytd2/pzhetd2.Array, DIMENSION(LOCc(ja+n-1)),if uplo = 'U', LOCc(ja+n-2) otherwise.The off-diagonal elements of the tridiagonal matrix T:e(i) = a(i,i+1) if uplo = 'U',e(i) = a(i+1,i) if uplo = 'L'.e is tied to the distributed matrix A.

(local)tauREAL for pssytd2DOUBLE PRECISION for pdsytd2COMPLEX for pchetd2COMPLEX*16 for pzhetd2.Array, DIMENSION(LOCc(ja+n-1)).The scalar factors of the elementary reflectors. tau is tiedto the distributed matrix A.

On exit, work(1) returns the minimal and optimal value oflwork.

work(1)

(local or global) INTEGER.lworkThe dimension of the workspace array work.

lwork is local input and must be at least lwork ≥ 3n.

2088


(local) INTEGER.info= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100),if the i-th argument is a scalar and had an illegal value,then info = -i.

Application Notes


Q = H(n-1). . . H(2) H(1)


H(i) = I - tau*v*v',

where tau is a real/complex scalar, and v is a real/complex vector with v(i+1:n) = 0 andv(i) = 1; v(1:i-1) is stored on exit in A(ia:ia+i-2, ja+i), and tau in TAU(ja+i-1).


Q = H(1) H(2) . . . H(n-1).


H(i) = I - tau*v*v' ,

where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1)= 1; v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1, ja+i-1), and tau in TAU(ja+i-1).

The contents of sub (A) on exit are illustrated by the following examples with n = 5:

2089

where d and e denotes diagonal and off-diagonal elements of T, and vi denotes an element ofthe vector defining H(i).

NOTE. The distributed submatrix sub(A) must verify some alignment properties, namelythe following expression should be true:

( mb_a.eq.nb_a .AND. iroffa.eq.icoffa ) with iroffa = mod(ia - 1, mb_a)and icoffa = mod(ja -1, nb_a).

p?trti2Computes the inverse of a triangular matrix (localunblocked algorithm).

Syntax

call pstrti2(uplo, diag, n, a, ia, ja, desca, info)

call pdtrti2(uplo, diag, n, a, ia, ja, desca, info)

call pctrti2(uplo, diag, n, a, ia, ja, desca, info)

call pztrti2(uplo, diag, n, a, ia, ja, desca, info)

Description

This routine computes the inverse of a real/complex upper or lower triangular block matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1).

This matrix should be contained in one and only one process memory space (local operation).

2090


Input Parameters

(global) CHARACTER*1.uploSpecifies whether the matrix sub (A) is upper or lowertriangular.= 'U': sub (A) is upper triangular= 'L': sub (A) is lower triangular.

(global) CHARACTER*1.diagSpecifies whether or not the matrix A is unit triangular.= 'N': sub (A) is non-unit triangular= 'U': sub (A) is unit triangular.

(global) INTEGER.nThe number of rows and columns to be operated on, i.e.,

the order of the distributed submatrix sub(A). n ≥ 0.

(local)aREAL for pstrti2DOUBLE PRECISION for pdtrti2COMPLEX for pctrti2COMPLEX*16 for pztrti2.Pointer into the local memory to an array,DIMENSION(lld_a, LOCc(ja+n-1)).On entry, this array contains the local pieces of thetriangular matrix sub(A).If uplo = 'U', the leading n-by-n upper triangular part ofthe matrix sub(A) contains the upper triangular part of thematrix, and the strictly lower triangular part of sub(A) is notreferenced.If uplo = 'L', the leading n-by-n lower triangular part ofthe matrix sub(A) contains the lower triangular part of thematrix, and the strictly upper triangular part of sub(A) isnot referenced. If diag = 'U', the diagonal elements ofsub(A) are not referenced either and are assumed to be 1.



desca

2091


Output Parameters

On exit, the (triangular) inverse of the original matrix, inthe same storage format.

a

INTEGER.info= 0: successful exit< 0: if the i-th argument is an array and the j-entry hadan illegal value,then info = - (i*100),if the i-th argument is a scalar and had an illegal value,then info = -i.

?lamshSends multiple shifts through a small (single node)matrix to maximize the number of bulges that canbe sent through.

Syntax

call slamsh(s, lds, nbulge, jblk, h, ldh, n, ulp)

call dlamsh(s, lds, nbulge, jblk, h, ldh, n, ulp)

Description

This routine sends multiple shifts through a small (single node) matrix to see how smallconsecutive subdiagonal elements are modified by subsequent shifts in an effort to maximizethe number of bulges that can be sent through. The subroutine should only be called whenthere are multiple shifts/bulges (nbulge > 1) and the first shift is starting in the middle ofan unreduced Hessenberg matrix because of two or more small consecutive subdiagonalelements.

Input Parameters

(local)sINTEGER. REAL for slamshDOUBLE PRECISION for dlamshArray, DIMENSION (lds,*).

2092

On entry, the matrix of shifts. Only the 2x2 diagonal of s isreferenced. It is assumed that s has jblk double shifts (size2).

(local) INTEGER.ldsOn entry, the leading dimension of S; unchanged on exit.

1<nbulge ≤ jblk ≤ lds/2.

(local) INTEGER.nbulgeOn entry, the number of bulges to send through h (>1).nbulge should be less than the maximum determined

(jblk). 1<nbulge ≤ jblk ≤ lds/2.

(local) INTEGER.jblkOn entry, the leading dimension of S; unchanged on exit.

(local) INTEGER.hREAL for slamshDOUBLE PRECISION for dlamshArray, DIMENSION (lds, n).On entry, the local matrix to apply the shifts on.h should be aligned so that the starting row is 2.

(local)ldhINTEGER.On entry, the leading dimension of H; unchanged on exit.

(local) INTEGER.nOn entry, the size of H. If all the bulges are expected to gothrough, n should be at least 4nbulge+2. Otherwise, nbulgemay be reduced by this routine.

(local)ulpREAL for slamshDOUBLE PRECISION for dlamshOn entry, machine precision. Unchanged on exit.

Output Parameters

On exit, the data is rearranged in the best order for applying.s

On exit, the maximum number of bulges that can be sentthrough.

nbulge

On exit, the data is destroyed.h

2093

?larefApplies Householder reflectors to matrices on eithertheir rows or columns.

Syntax

call slaref(type, a, lda, wantz, z, ldz, block, irow1, icol1, istart, istop,itmp1, itmp2, liloz, lihiz, vecs, v2, v3, t1, t2, t3)

call dlaref(type, a, lda, wantz, z, ldz, block, irow1, icol1, istart, istop,itmp1, itmp2, liloz, lihiz, vecs, v2, v3, t1, t2, t3)

Description

This routine applies one or several Householder reflectors of size 3 to one or two matrices (ifcolumn is specified) on either their rows or columns.

Input Parameters

(global) CHRACTER*1.typeIf type = 'R', apply reflectors to the rows of the matrix(apply from left). Otherwise, apply reflectors to the columnsof the matrix. Unchanged on exit.

(global) REAL for slarefaDOUBLE PRECISION for dlarefArray, DIMENSION (lda, *).On entry, the matrix to receive the reflections.

(local) INTEGER.ldaOn entry, the leading dimension of A; unchanged on exit.

(global) LOGICAL.wantzIf wantz = .TRUE., apply any column reflections to Z aswell.If wantz = .FALSE., do no additional work on Z.

(global) REAL for slarefzDOUBLE PRECISION for dlarefArray, DIMENSION (ldz, *).On entry, the second matrix to receive column reflections.

(local) INTEGER.ldzOn entry, the leading dimension of Z; unchanged on exit.

2094


(global). LOGICAL.block= .TRUE. : apply several reflectors at once and read theirdata from the vecs array;= .FALSE. : apply the single reflector given by v2, v3, t1,t2, and t3.

(local) INTEGER.irow1On entry, the local row element of the matrix A.

(local) INTEGER.icol1On entry, the local column element of the matrix A.

(global) INTEGER.istartSpecifies the "number" of the first reflector.istart is used as an index into vecs if block is set. istartis ignored if block is .FALSE. .

(global) INTEGER.istopSpecifies the "number" of the last reflector.istop is used as an index into vecs if block is set. istopis ignored if block is .FALSE. .

(local) INTEGER.itmp1Starting range into A. For rows, this is the local first column.For columns, this is the local first row.

(local) INTEGER.itmp2Ending range into A. For rows, this is the local last column.For columns, this is the local last row.

(local). INTEGER.liloz, lihizServe the same purpose as itmp1, itmp2 but for Z whenwantz is set.

(global)vecsREAL for slarefDOUBLE PRECISION for dlaref.Array of size 3 *n (matrix size). This array holds the size 3reflectors one after another and is only accessed when blockis .TRUE.

(global). INTEGER.v2,v3,t1,t2,t3REAL for slarefDOUBLE PRECISION for dlaref.

2095


These parameters hold information on a single size 3Householder reflector and are read when block is .FALSE.,and overwritten when block is .TRUE..

Output Parameters

On exit, the updated matrix.a

Changed only if wantz is set. If wantz is .FALSE. , z is notreferenced.

z

Undefined.irow1

Undefined.icol1

These parameters are read when block is .FALSE., andoverwritten when block is .TRUE..

v2,v3,t1,t2,t3

?lasorteSorts eigenpairs by real and complex data types.

Syntax

call slasorte(s, lds, j, out, info)

call dlasorte(s, lds, j, out, info)

Description

This routine sorts eigenpairs so that real eigenpairs are together and complex eigenpairs aretogether. This helps to employ 2x2 shifts easily since every 2nd subdiagonal is guaranteed tobe zero. This routine does no parallel work and makes no calls.

Input Parameters

(local) INTEGER.sREAL for slasorteDOUBLE PRECISION for dlasorteArray, DIMENSION (lds).On entry, a matrix already in Schur form.

(local) INTEGER.ldsOn entry, the leading dimension of the array s; unchangedon exit.

2096


(local) INTEGER.jOn entry, the order of the matrix S; unchanged on exit.

(local) INTEGER.outREAL for slasorteDOUBLE PRECISION for dlasorteArray, DIMENSION (2*j). The work buffer required by theroutine.

(local) INTEGER.infoSet, if the input matrix had an odd number of realeigenvalues and things could not be paired or if the inputmatrix S was not originally in Schur form. 0 indicatessuccessful completion.

Output Parameters

On exit, the diagonal blocks of S have been rewritten to pairthe eigenvalues. The resulting matrix is no longer similarto the input.

s

Work buffer.out

?lasrt2Sorts numbers in increasing or decreasing order.

Syntax

call slasrt2(id, n, d, key, info)

call dlasrt2(id, n, d, key, info)

Description

This routine is modified LAPACK routine ?lasrt, which sorts the numbers in d in increasingorder (if id = 'I') or in decreasing order (if id = 'D' ). It uses Quick Sort, reverting to

Insertion Sort on arrays of size ≤ 20. Dimension of STACK limits n to about 232.

Input Parameters

CHARACTER*1.id= 'I': sort d in increasing order;

2097


= 'D': sort d in decreasing order.

INTEGER. The length of the array d.n

REAL for slasrt2dDOUBLE PRECISION for dlasrt2.Array, DIMENSION (n).On entry, the array to be sorted.

INTEGER.keyArray, DIMENSION (n).On entry, key contains a key to each of the entries in d().Typically, key(i) = i for all i .

Output Parameters

On exit, d has been sorted into increasing orderD

(d(1) ≤ ... ≤ d(n) ) or into decreasing order

(d(1) ≥ ... ≥ d(n) ), depending on id.


On exit, key is permuted in exactly the same manner as d()was permuted from input to output. Therefore, if key(i)= i for all i upon input, then d_out(i) = d_in(key(i)).

key

?stein2Computes the eigenvectors corresponding tospecified eigenvalues of a real symmetrictridiagonal matrix, using inverse iteration.

Syntax

call sstein2(n, d, e, m, w, iblock, isplit, orfac, z, ldz, work, iwork, ifail,info)

call dstein2(n, d, e, m, w, iblock, isplit, orfac, z, ldz, work, iwork, ifail,info)

2098


Description

This routine is a modified LAPACK routine ?stein. It computes the eigenvectors of a realsymmetric tridiagonal matrix T corresponding to specified eigenvalues, using inverse iteration.

The maximum number of iterations allowed for each eigenvector is specified by an internalparameter maxits (currently set to 5).

Input Parameters


INTEGER. The number of eigenvectors to be found (0 ≤ m

≤ n).

m

REAL for single-precision flavorsd, e, wDOUBLE PRECISION for double-precision flavors.Arrays: d(*), DIMENSION (n). The n diagonal elements ofthe tridiagonal matrix T.e(*), DIMENSION (n).The (n-1) subdiagonal elements of the tridiagonal matrixT, in elements 1 to n-1. e(n) need not be set.w(*), DIMENSION (n).The first m elements of w contain the eigenvalues for whicheigenvectors are to be computed. The eigenvalues shouldbe grouped by split-off block and ordered from smallest tolargest within the block. (The output array w from ?stebzwith ORDER = 'B' is expected here).The dimension of w must be at least max(1, n).

INTEGER.iblockArray, DIMENSION (n).The submatrix indices associated with the correspondingeigenvalues in w ;iblock(i) = 1, if eigenvalue w(i) belongs to the firstsubmatrix from the top,iblock(i) = 2, if eigenvalue w(i) belongs to the secondsubmatrix, etc. (The output array iblock from ?stebz isexpected here).

INTEGER.isplitArray, DIMENSION (n).

2099


The splitting points, at which T breaks up into submatrices.The first submatrix consists of rows/columns 1 to isplit(1),the second submatrix consists of rows/columnsisplit(1)+1 through isplit( 2 ), etc. (The output arrayisplit from ?stebz is expected here).

REAL for single-precision flavorsorfacDOUBLE PRECISION for double-precision flavors.orfac specifies which eigenvectors should be orthogonalized.Eigenvectors that correspond to eigenvalues which are withinorfac*|| T || of each other are to be orthogonalized.


≥ max(1, n).

ldz

REAL for single-precision flavorsworkDOUBLE PRECISION for double-precision flavors.Workspace array, DIMENSION (5n).

INTEGER. Workspace array, DIMENSION (n).iwork

Output Parameters

REAL for sstein2zDOUBLE PRECISION for dstein2Array, DIMENSION (ldz, m).The computed eigenvectors. The eigenvector associatedwith the eigenvalue w(i) is stored in the i-th column of z.Any vector that fails to converge is set to its current iterateafter maxits iterations.

INTEGER.ifailArray, DIMENSION (m).On normal exit, all elements of ifail are zero. If one ormore eigenvectors fail to converge after maxits iterations,then their indices are stored in the array ifail.

INTEGER.infoinfo = 0, the exit is successful.info < 0: if info = -i, the i-th had an illegal value.info > 0: if info = i, then i eigenvectors failed toconverge in maxits iterations. Their indices are stored inthe array ifail.

2100

?dbtf2Computes an LU factorization of a general bandmatrix with no pivoting (local unblocked algorithm).

Syntax

call sdbtf2(m, n, kl, ku, ab, ldab, info)

call ddbtf2(m, n, kl, ku, ab, ldab, info)

call cdbtf2(m, n, kl, ku, ab, ldab, info)

call zdbtf2(m, n, kl, ku, ab, ldab, info)

Description

This routine computes an LU factorization of a general real/complex m-by-n band matrix Awithout using partial pivoting with row interchanges.

This is the unblocked version of the algorithm, calling BLAS Routines and Functions.

Input Parameters

INTEGER. The number of rows of the matrix A(m ≥ 0).m

INTEGER. The number of columns in A(n ≥ 0).n


A(kl ≥ 0).

kl


of A(ku ≥ 0).

ku

REAL for sdbtf2abDOUBLE PRECISION for ddbtf2COMPLEX for cdbtf2COMPLEX*16 for zdbtf2.Array, DIMENSION (ldab, n).The matrix A in band storage, in rows kl+1 to 2kl+ku+1;rows 1 to kl of the array need not be set. The j-th columnof A is stored in the j-th column of the array ab as follows:

ab(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku) ≤ i

≤ min(m,j+kl).

2101


Output Parameters

On exit, details of the factorization: U is stored as an uppertriangular band matrix with kl+ku superdiagonals in rows1 to kl+ku+1, and the multipliers used during thefactorization are stored in rows kl+ku+2 to 2*kl+ku+1.See the Application Notes below for further details.

ab

INTEGER.info= 0: successful exit< 0: if info = - i, the i-th argument had an illegal value,> 0: if info = + i, u(i,i) is 0. The factorization hasbeen completed, but the factor U is exactly singular. Divisionby 0 will occur if you use the factor U for solving a systemof linear equations.

Application Notes

The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2,ku = 1:

The routine does not use array elements marked *; elements marked + need not be set onentry, but the routine requires them to store elements of U, because of fill-in resulting fromthe row interchanges.

2102

?dbtrfComputes an LU factorization of a general bandmatrix with no pivoting (local blocked algorithm).

Syntax

call sdbtrf(m, n, kl, ku, ab, ldab, info)

call ddbtrf(m, n, kl, ku, ab, ldab, info)

call cdbtrf(m, n, kl, ku, ab, ldab, info)

call zdbtrf(m, n, kl, ku, ab, ldab, info)

Description

This routine computes an LU factorization of a real m-by-n band matrix A without using partialpivoting or row interchanges.

This is the blocked version of the algorithm, calling BLAS Routines and Functions.

Input Parameters


INTEGER. The number of columns in A(n ≥ 0).n


A(kl ≥ 0).

kl


of A(ku ≥ 0).

ku

REAL for sdbtrfabDOUBLE PRECISION for ddbtrfCOMPLEX for cdbtrfCOMPLEX*16 for zdbtrf.Array, DIMENSION (ldab, n).The matrix A in band storage, in rows kl+1 to 2kl+ku+1;rows 1 to kl of the array need not be set. The j-th columnof A is stored in the j-th column of the array ab as follows:

ab(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku) ≤ i ≤min(m,j+kl).

2103


Output Parameters

On exit, details of the factorization: U is stored as an uppertriangular band matrix with kl+ku superdiagonals in rows1 to kl+ku+1, and the multipliers used during thefactorization are stored in rows kl+ku+2 to 2*kl+ku+1. Seethe Application Notes below for further details.

ab

INTEGER.info= 0: successful exit< 0: if info = - i, the i-th argument had an illegal value,> 0: if info = + i, u(i,i) is 0. The factorization hasbeen completed, but the factor U is exactly singular. Divisionby 0 will occur if you use the factor U for solving a systemof linear equations.

Application Notes

The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2,ku = 1:

The routine does not use array elements marked *.

2104

?dttrfComputes an LU factorization of a generaltridiagonal matrix with no pivoting (local blockedalgorithm).

Syntax

call sdttrf(n, dl, d, du, info)

call ddttrf(n, dl, d, du, info)

call cdttrf(n, dl, d, du, info)

call zdttrf(n, dl, d, du, info)

Description

This routine computes an LU factorization of a real or complex tridiagonal matrix A usingelimination without partial pivoting.

The factorization has the form A = L*U, where L is a product of unit lower bidiagonal matricesand U is upper triangular with nonzeros only in the main diagonal and first superdiagonal.

Input Parameters

INTEGER. The order of the matrix A(n ≥ 0).n

REAL for sdttrfdl, d, duDOUBLE PRECISION for ddttrfCOMPLEX for cdttrfCOMPLEX*16 for zdttrf.Arrays containing elements of A.The array dl of DIMENSION(n - 1) contains thesub-diagonal elements of A.The array d of DIMENSION n contains the diagonal elementsof A.The array du of DIMENSION(n - 1) contains thesuper-diagonal elements of A.

Output Parameters

Overwritten by the (n-1) multipliers that define the matrixL from the LU factorization of A.

dl

2105


Overwritten by the n diagonal elements of the uppertriangular matrix U from the LU factorization of A.

d

Overwritten by the (n-1) elements of the firstsuper-diagonal of U.

du

INTEGER.info= 0: successful exit< 0: if info = - i, the i-th argument had an illegal value,> 0: if info = i, u(i,i) is exactly 0. The factorizationhas been completed, but the factor U is exactly singular.Division by 0 will occur if you use the factor U for solving asystem of linear equations.

?dttrsvSolves a general tridiagonal system of linearequations using the LU factorization computed by?dttrf.

Syntax

call sdttrsv(uplo, trans, n, nrhs, dl, d, du, b, ldb, info)

call ddttrsv(uplo, trans, n, nrhs, dl, d, du, b, ldb, info)

call cdttrsv(uplo, trans, n, nrhs, dl, d, du, b, ldb, info)

call zdttrsv(uplo, trans, n, nrhs, dl, d, du, b, ldb, info)

Description

This routine solves one of the following systems of linear equations:

L*X = B, LT*X = B, or LH*X = B,

U*X = B, UT*X = B, or UH*X = B

with factors of the tridiagonal matrix A from the LU factorization computed by ?dttrf.

Input Parameters

CHARACTER*1.uploSpecifies whether to solve with L or U.

CHARACTER. Must be 'N' or 'T' or 'C'.trans

2106

Indicates the form of the equations:If trans = 'N', then A*X = B is solved for X (notranspose).If trans = 'T', then AT*X = B is solved for X (transpose).If trans = 'C', then AH*X = B is solved for X (conjugatetranspose).

INTEGER. The order of the matrix A(n ≥ 0).n

INTEGER. The number of right-hand sides, that is, the

number of columns in the matrix B(nrhs ≥ 0).

nrhs

REAL for sdttrsvdl,d,du,bDOUBLE PRECISION for ddttrsvCOMPLEX for cdttrsvCOMPLEX*16 for zdttrsv.Arrays of DIMENSIONs: dl(n -1 ), d(n), du(n -1 ),b(ldb,nrhs).The array dl contains the (n - 1) multipliers that definethe matrix L from the LU factorization of A.The array d contains n diagonal elements of the uppertriangular matrix U from the LU factorization of A.The array du contains the (n - 1) elements of the firstsuper-diagonal of U.On entry, the array b contains the right-hand side matrixB.

INTEGER. The leading dimension of the array b; ldb ≥max(1, n).

ldb

Output Parameters



2107


?pttrsvSolves a symmetric (Hermitian) positive-definitetridiagonal system of linear equations, using theL*D*LH factorization computed by ?pttrf.

Syntax

call spttrsv(trans, n, nrhs, d, e, b, ldb, info)

call dpttrsv(trans, n, nrhs, d, e, b, ldb, info)

call cpttrsv(uplo, trans, n, nrhs, d, e, b, ldb, info)

call zpttrsv(uplo, trans, n, nrhs, d, e, b, ldb, info)

Description

This routine solves one of the triangular systems:

LT*X = B, or L*X = B for real flavors,

or

L*X = B, or LH*X = B,

U*X = B, or UH*X = B for complex flavors,

where L (or U for complex flavors) is the Cholesky factor of a Hermitian positive-definitetridiagonal matrix A such that

A = L*D*LH (computed by spttrf/dpttrf)

or

A = UH*D*U or A = L*D*LH (computed by cpttrf/zpttrf).

Input Parameters

CHARACTER*1. Must be 'U' or 'L'.uploSpecifies whether the superdiagonal or the subdiagonal ofthe tridiagonal matrix A is stored and the form of thefactorization:If uplo = 'U', e is the superdiagonal of U, and A =U'*D*U;if uplo = 'L', e is the subdiagonal of L, and A = L*D*L'.The two forms are equivalent, if A is real.

2108


CHARACTER.transSpecifies the form of the system of equations:for real flavors:if trans = 'N': L*X = B (no transpose)if trans = 'T': LT*X = B (transpose)for complex flavors:if trans = 'N': U*X = B or L*X = B (no transpose)if trans = 'C': UH*X = B or LH*X = B (conjugatetranspose).

INTEGER. The order of the tridiagonal matrix A. n ≥ 0.n

INTEGER. The number of right hand sides, that is, the

number of columns of the matrix B. nrhs ≥ 0.

nrhs

REAL array, DIMENSION (n). The n diagonal elements of thediagonal matrix D from the factorization computed by?pttrf.

d

COMPLEX array, DIMENSION(n-1). The (n-1) off-diagonalelements of the unit bidiagonal factor U or L from thefactorization computed by ?pttrf. See uplo.

e

COMPLEX array, DIMENSION (ldb, nrhs).bOn entry, the right hand side matrix B.

INTEGER.ldbThe leading dimension of the array b.

ldb ≥ max(1, n).

Output Parameters

On exit, the solution matrix X.b


2109


?steqr2Computes all eigenvalues and, optionally,eigenvectors of a symmetric tridiagonal matrixusing the implicit QL or QR method.

Syntax

call ssteqr2(compz, n, d, e, z, ldz, nr, work, info)

call dsteqr2(compz, n, d, e, z, ldz, nr, work, info)

Description

This routine is a modified version of LAPACK routine ?steqr. The routine ?steqr2 computesall eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the implicitQL or QR method. ?steqr2 is modified from ?steqr to allow each ScaLAPACK process running?steqr2 to perform updates on a distributed matrix Q. Proper usage of ?steqr2 can be gleanedfrom examination of ScaLAPACK routine p?syev.

Input Parameters

CHARACTER*1. Must be 'N' or 'I'.compzIf compz = 'N', the routine computes eigenvalues only. Ifcompz = 'I', the routine computes the eigenvalues andeigenvectors of the tridiagonal matrix T.z must be initialized to the identity matrix by p?laset or?laset prior to entering this subroutine.

INTEGER. The order of the matrix T(n ≥ 0).n

REAL for single-precision flavorsd, e, workDOUBLE PRECISION for double-precision flavors.Arrays: d contains the diagonal elements of T. The dimensionof d must be at least max(1, n).e contains the (n-1) subdiagonal elements of T. Thedimension of e must be at least max(1, n-1).work is a workspace array. The dimension of work is max(1,2*n-2). If compz = 'N', then work is not referenced.

(local)zREAL for ssteqr2DOUBLE PRECISION for dsteqr2

2110


Array, global DIMENSION (n, n), local DIMENSION (ldz, nr).If compz = 'V', then z contains the orthogonal matrix usedin the reduction to tridiagonal form.

INTEGER. The leading dimension of the array z. Constraints:ldz

ldz ≥ 1,

ldz ≥ max(1, n), if eigenvectors are desired.

INTEGER. nr = max(1, numroc(n, nb, myprow, 0,nprocs)).

nr

If compz = 'N', then nr is not referenced.

Output Parameters

REAL array, DIMENSION (n), for ssteqr2.dDOUBLE PRECISION array, DIMENSION (n), for dsteqr2.On exit, the eigenvalues in ascending order, if info = 0.See also info.

REAL array, DIMENSION (n-1), for ssteqr2.eDOUBLE PRECISION array, DIMENSION (n-1), for dsteqr2.On exit, e has been destroyed.

(local)zREAL for ssteqr2DOUBLE PRECISION for dsteqr2Array, global DIMENSION (n, n), local DIMENSION (ldz, nr).On exit, if info = 0, then,if compz = 'V', z contains the orthonormal eigenvectorsof the original symmetric matrix, and if compz = 'I', zcontains the orthonormal eigenvectors of the symmetrictridiagonal matrix. If compz = 'N', then z is not referenced.

INTEGER.infoinfo = 0, the exit is successful.info < 0: if info = -i, the i-th had an illegal value.info > 0: the algorithm has failed to find all theeigenvalues in a total of 30n iterations;if info = i, then i elements of e have not converged tozero; on exit, d and e contain the elements of a symmetrictridiagonal matrix, which is orthogonally similar to theoriginal matrix.

2111

Utility Functions and RoutinesThis section describes ScaLAPACK utility functions and routines. Summary information aboutthese routines is given in the following table:

Table 7-2 ScaLAPACK Utility Functions and Routines

DescriptionData TypesRoutineName

Returns the square root of the underflow and overflowthresholds if the exponent-range is very large.

s,dp?labad

Performs a simple check for the features of the IEEE standard.(C interface function).

s,dp?lachkieee

Determines machine parameters for floating-point arithmetic.s,dp?lamch

Computes the position of the sign bit of a floating-point number.(C interface function).

s,dp?lasnbt

Error handling routine called by ScaLAPACK routines.pxerbla

p?labadReturns the square root of the underflow andoverflow thresholds if the exponent-range is verylarge.

Syntax

call pslabad(ictxt, small, large)

call pdlabad(ictxt, small, large)

Description

This routine takes as input the values computed by p?lamch for underflow and overflow, andreturns the square root of each of these values if the log of large is sufficiently large. Thissubroutine is intended to identify machines with a large exponent range, such as the Crays,and redefine the underflow and overflow limits to be the square roots of the values computedby p?lamch. This subroutine is needed because p?lamch does not compensate for poorarithmetic in the upper half of the exponent range, as is found on a Cray.

2112


In addition, this routine performs a global minimization and maximization on these values, tosupport heterogeneous computing networks.

Input Parameters

(global) INTEGER.ictxtThe BLACS context handle in which the computation takesplace.

(local).smallREAL PRECISION for pslabad.DOUBLE PRECISION for pdlabad.On entry, the underflow threshold as computed by p?lamch.

(local).largeREAL PRECISION for pslabad.DOUBLE PRECISION for pdlabad.On entry, the overflow threshold as computed by p?lamch.

Output Parameters

(local).smallOn exit, if log10(large) is sufficiently large, the squareroot of small, otherwise unchanged.

(local).largeOn exit, if log10(large) is sufficiently large, the squareroot of large, otherwise unchanged.

p?lachkieeePerforms a simple check for the features of theIEEE standard. (C interface function).

Syntax

void pslachkieee(int *isieee, float *rmax, float *rmin);

void pdlachkieee(int *isieee, float *rmax, float *rmin);

Description

This routine performs a simple check to make sure that the features of the IEEE standard areimplemented. In some implementations, p?lachkieee may not return.

2113


Note that all arguments are call-by-reference so that this routine can be directly called fromFortran code.

This is a ScaLAPACK internal subroutine and arguments are not checked for unreasonablevalues.

Input Parameters

(local).rmaxREAL for pslachkieeeDOUBLE PRECISION for pdlachkieeeThe overflow threshold(= ?lamch ('O')).

(local).rminREAL for pslachkieeeDOUBLE PRECISION for pdlachkieeeThe underflow threshold(= ?lamch ('U')).

Output Parameters

(local). INTEGER.isieeeOn exit, isieee = 1 implies that all the features of theIEEE standard that we rely on are implemented. On exit,isieee = 0 implies that some the features of the IEEEstandard that we rely on are missing.

p?lamchDetermines machine parameters for floating-pointarithmetic.

Syntax

val = pslamch(ictxt, cmach)

val = pdlamch(ictxt, cmach)

Description

This function determines single precision machine parameters.

2114


Input Parameters

(global). INTEGER.The BLACS context handle in which thecomputation takes place.

ictxt

(global) CHARACTER*1.cmachSpecifies the value to be returned by p?lamch:= 'E' or 'e', p?lamch := eps= 'S' or 's' , p?lamch := sfmin= 'B' or 'b', p?lamch := base= 'P' or 'p', p?lamch := eps*base= 'N' or 'n', p?lamch := t= 'R' or 'r', p?lamch := rnd= 'M' or 'm', p?lamch := emin= 'U' or 'u', p?lamch := rmin= 'L' or 'l', p?lamch := emax= 'O' or 'o', p?lamch := rmax,whereeps = relative machine precisionsfmin = safe minimum, such that 1/sfmin does not overflowbase = base of the machineprec = eps*baset = number of (base) digits in the mantissarnd = 1.0 when rounding occurs in addition, 0.0 otherwiseemin = minimum exponent before (gradual) underflowrmin = underflow threshold - base(emin-1)

emax = largest exponent before overflowrmax = overflow threshold - (baseemax)*(1-eps)

Output Parameters

Value returned by the fuction.val

2115


p?lasnbtComputes the position of the sign bit of afloating-point number. (C interface function).

Syntax

void pslasnbt(int *ieflag);

void pdlasnbt(int *ieflag);

Description

This routine finds the position of the signbit of a single/double precision floating point number.This routine assumes IEEE arithmetic, and hence, tests only the 32-nd bit (for single precision)or 32-nd and 64-th bits (for double precision) as a possibility for the signbit. sizeof(int) isassumed equal to 4 bytes.

If a compile time flag (NO_IEEE) indicates that the machine does not have IEEE arithmetic,ieflag = 0 is returned.

Output Parameters

INTEGER.ieflagThis flag indicates the position of the signbit of anysingle/double precision floating point number.ieflag = 0, if the compile time flag NO_IEEE indicatesthat the machine does not have IEEE arithmetic, or ifsizeof(int) is different from 4 bytes.ieflag = 1 indicates that the signbit is the 32-nd bit fora single precision routine.In the case of a double precision routine:ieflag = 1 indicates that the signbit is the 32-nd bit (BigEndian).ieflag = 2 indicates that the signbit is the 64-th bit (LittleEndian).

2116


pxerblaError handling routine called by ScaLAPACKroutines.

Syntax

call pxerbla(ictxt, srname, info)

Description

This routine is an error handler for the ScaLAPACK routines. It is called by a ScaLAPACK routineif an input parameter has an invalid value. A message is printed. Program execution is notterminated. For the ScaLAPACK driver and computational routines, a RETURN statement isissued following the call to pxerbla.

Control returns to the higher-level calling routine, and it is left to the user to determine howthe program should proceed. However, in the specialized low-level ScaLAPACK routines (auxiliaryroutines that are Level 2 equivalents of computational routines), the call to pxerbla() isimmediately followed by a call to BLACS_ABORT() to terminate program execution since recoveryfrom an error at this level in the computation is not possible.

It is always good practice to check for a nonzero value of info on return from a ScaLAPACKroutine. Installers may consider modifying this routine in order to call system-specificexception-handling facilities.

Input Parameters

(global) INTEGERictxtThe BLACS context handle, indicating the global context ofthe operation. The context itself is global.

(global) CHARACTER*6srnameThe name of the routine which called pxerbla.

(global) INTEGER.infoThe position of the invalid parameter in the parameter listof the calling routine.

2117


8Sparse Solver Routines

Intel® Math Kernel Library (Intel® MKL) provides user-callable sparse solver software to solve real orcomplex, symmetric, structurally symmetric or non-symmetric, positive definite, indefinite or Hermitiansparse linear system of equations.

The terms and concepts required to understand the use of the Intel MKL direct sparse solver subroutinesare discussed in the Linear Solvers Basics appendix. If you are familiar with linear sparse solvers andsparse matrix storage schemes, you can omit reading these sections and go directly to the interfacedescriptions. The direct sparse solver PARDISO* is described in the section that follows. After that, twoalternative interfaces (direct sparse solver and iterative sparse solver) that consist of several Intel MKLroutines implementing the step-by-step solution process are described.

PARDISO - Parallel Direct Sparse Solver InterfaceThis section describes the interface to the shared-memory multiprocessing parallel direct sparsesolver known as PARDISO. The interface is Fortran, but can be called from C programs by observingFortran parameter passing and naming conventions used by the supported compilers and operatingsystems. A discussion of the algorithms used in PARDISO and more information on the solver canbe found at http://www.computational.unibas.ch/cs/scicomp.

The PARDISO package is a high-performance, robust, memory efficient and easy to use softwarefor solving large sparse symmetric and unsymmetric linear systems of equations on shared memorymultiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernodetechniques [Schenk00-2]. In order to improve sequential and parallel sparse numerical factorizationperformance, the algorithms are based on a Level-3 BLAS update and pipelining parallelism isexploited with a combination of left- and right-looking supernode techniques [Schenk00,Schenk01, Schenk02, Schenk03]. The parallel pivoting methods allow complete supernodepivoting in order to compromise numerical stability and scalability during the factorization process.For sufficiently large problem sizes, numerical experiments demonstrate that the scalability of theparallel algorithm is nearly independent of the shared-memory multiprocessing architecture and aspeedup of up to seven using eight processors has been observed.

2119

PARDISO supports, as illustrated in Figure 8-1, a wide range of sparse matrix types and computesthe solution of real or complex, symmetric, structurally symmetric or unsymmetric, positivedefinite, indefinite or Hermitian sparse linear system of equations on shared-memorymultiprocessing architectures.

Figure 8-1 Sparse Matrices That Can be Solved With PARDISO

You can find example code that uses PARDISO interface routine to solve systems of linearequations in PARDISO Code Examples section in the Appendix C .

pardisoCalculates the solution of a set of sparse linearequations with multiple right-hand sides.

Syntax

Fortran:

call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm,msglvl, b, x, error)

C:

pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, perm, &nrhs,iparm, &msglvl, b, x, &error);

2120


(An underscore may or may not be required after “pardiso” depending on the OS and compilerconventions for that OS).

Interface:

SUBROUTINE pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs,iparm, msglvl, b, x, error)

INTEGER *4 pt (64)

INTEGER *4 maxfct, mnum, mtype, phase, n, nrhs, error, ia(*), ja(*), perm(*),iparm(*)

REAL *8 a(*), b(n,nrhs), x(n,nrhs)

Note that the above interface is given for the 32-bit architectures. For 64-bit architectures, theargument pt(64) must be defined as INTEGER*8 type.

Description

PARDISO calculates the solution of a set of sparse linear equations

AX = B

with multiple right-hand sides, using a parallel LU, LDL or LLT factorization, where A is an n-by-nmatrix, and X and B are n-by-nrhs matrices. PARDISO performs the following analysis stepsdepending on the structure of the input matrix A.

Symmetric Matrices: The solver first computes a symmetric fill-in reducing permutation Pbased on either the minimum degree algorithm [Liu85] or the nested dissection algorithm fromthe METIS package [Karypis98] (included with Intel MKL), followed by the parallel left-rightlooking numerical Cholesky factorization [Schenk00-2] of PAPT = LLT for symmetricpositive-definite matrices, or PAPT = LDLT for symmetric indefinite matrices. The solver usesdiagonal pivoting, or 1x1 and 2x2 Bunch and Kaufman pivoting for symmetric indefinite matrices,and an approximation of X is found by forward and backward substitution and iterativerefinements.

The coefficient matrix is perturbed whenever numerically acceptable 1x1 and 2x2 pivots cannotbe found within the diagonal supernode block. One or two passes of iterative refinements maybe required to correct the effect of the perturbations. This restricting notion of pivoting withiterative refinements is effective for highly indefinite symmetric systems. Furthermore theaccuracy of this method is for a large set of matrices from different applications areas asaccurate as a direct factorization method that uses complete sparse pivoting techniques[Schenk04]. Another possiblity to improve the pivoting accuracy is to use symmetric weightedmatchings algorithms. These methods identify large entries in the coefficient matrix A that, ifpermuted close to the diagonal, permit the factorization process to identify more acceptable

2121

Sparse Solver Routines 8

pivots and proceed with fewer pivot perturbations. The methods are based on maximum weightedmatchings and improve the quality of the factor in a complementary way to the alternative ideaof using more complete pivoting techniques.

The inertia is also computed for real symmetric indefinite matrices.

Structurally Symmetric Matrices: The solver first computes a symmetric fill-in reducingpermutation P followed by the parallel numerical factorization of PAPT = QLUT. The solver usespartial pivoting in the supernodes and an approximation of X is found by forward and backwardsubstitution and iterative refinements.

Unsymmetric Matrices: The solver first computes a non-symmetric permutation PMPS andscaling matrices Dr and Dc with the aim to place large entries on the diagonal which enhancesgreatly the reliability of the numerical factorization process [Duff99]. In the next step the solvercomputes a fill-in reducing permutation P based on the matrix PMPSA + (PMPSA)T followed by theparallel numerical factorization

QLUR = PPMPSDrADcP

with supernode pivoting matrices Q and R. When the factorization algorithm reaches a pointwhere it cannot factorize the supernodes with this pivoting strategy, it uses a pivotingperturbation strategy similar to [Li99]. The magnitude of the potential pivot is tested againsta constant threshold of alpha = eps*||A2||inf , where eps is the machine precision, A2 =P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity norm of the scaled and permuted matrix A.Therefore any tiny pivots encountered during elimination are set to the sign(lii)*eps*||A2||inf - this trades off some numerical stability for the ability to keep pivotsfrom getting too small. Although many failures could render the factorization well-defined butessentially useless, in practice it is observed that the diagonal elements are rarely modified fora large class of matrices. The result of this pivoting approach is that the factorization is, ingeneral, not exact and iterative refinement may be needed.

Direct-Iterative Preconditioning for Unsymmetric Linear Systems. The solver also allowsa combination of direct and iterative methods [Sonn89] in order to accelerate the linear solutionprocess for transient simulation. A majority of applications of sparse solvers require solutionsof systems with gradually changing values of the nonzero coefficient matrix, but the sameidentical sparsity pattern. In these applications, the analysis phase of the solvers has to beperformed only once and the numerical factorizations are the important time-consuming stepsduring the simulation. PARDISO uses a numerical factorization A = LU for the first system andapplies these exact factors L and U for the next steps in a preconditioned Krylow-Subspaceiteration. If the iteration does not converge, the solver will automatically switch back to thenumerical factorization. This method can be applied for unsymmetric matrices in PARDISO andthe user can select the method using only one input parameter. For further details see theparameter description (iparm(4), iparm(20)).

2122


The sparse data storage in PARDISO follows the scheme described in Sparse Matrix StorageFormat section with ja standing for columns, ia for rowIndex, and a for values. The algorithmsin PARDISO require column indices ja to be increasingly ordered per row and the presence ofthe diagonal element per row for any symmetric or structurally symmetric matrix. Theunsymmetric matrices need no diagonal elements in the PARDISO solver.

There are four tasks that PARDISO is capable of performing, namely analysis and symbolicfactorization, numerical factorization, forward and backward substitution including iterativerefinement and finally the termination to release all internal solver memory. When an inputdata structure is not accessed in a call, a NULL pointer or any valid address can be passed asa place holder for that argument.

Input Parameters

INTEGER*4 for 32 -bit operating systems;ptINTEGER*8 for 64 -bit operating systemsArray, DIMENSION (64)On entry, this is the solver internal data address pointer.Theses addresses are passed to the solver and all relatedinternal memory management is organized through thispointer

NOTE. pt is an integer array with 64 entries. It isvery important that the pointer is initialized with zeroat the first call of PARDISO. After that first call youshould never modify the pointer, as a serious memoryleak can occur. The integer length should be 4-byteon 32-bit operating systems and 8-byte on 64-bitoperating systems.

INTEGERmaxfctMaximal number of factors with identical nonzero sparsitystructure that the user would like to keep at the same timein memory. It is possible to store several differentfactorizations with the same nonzero structure at the sametime in the internal data management of the solver. In mostof the applications this value is equal to 1.

2123


PARDISO can process several matrices with identical matrixsparsity pattern and is able to store the factors of thesematrices at the same time. Matrices with different sparsitystructure can be kept in memory with different memoryaddress pointers pt.

INTEGERmnumActual matrix for the solution phase. With this scalar youcan define the matrix that you would like to factorize. The

value must be: 1 ≤ mnum ≤ maxfct.In most of the applications this value is equal to 1.

INTEGERmtypeThis scalar value defines the matrix type. The PARDISOsolver supports the following matrices:

real and structurally symmetricmatrix

= 1

real and symmetric positive definitematrix

= 2

real and symmetric indefinite matrix= -2

complex and structurally symmetricmatrix

= 3

complex and Hermitian positivedefinite matrix

= 4

complex and Hermitian indefinitematrix

= -4

complex and symmetric matrix= 6

real and unsymmetric matrix= 11

complex and unsymmetric matrix= 13

This parameter influences the pivoting method.

INTEGERphaseControls the execution of the solver. It is a two-digit integer

ij (10i + j, 1≤ i ≤3, i<j ≤3 for normal executionmodes). The i digit indicates the starting phase of execution,and j indicates the ending phase. PARDISO has the followingphases of execution:

2124

• Phase 1: Fill-reduction analysis and symbolic factorization

• Phase 2: Numerical factorization

• Phase 3: Forward and Backward solve including iterativerefinements

• Termination and Memory Release Phase (phase≤ 0)

If a previous call to the routine has computed informationfrom previous phases, execution may start at any phase.The phase parameter can have the following values:

Solver Execution Stepsphase

Analysis11

Analysis, numerical factorization12

Analysis, numerical factorization, solve,iterative refinement

13

Numerical factorization22

Numerical factorization, solve, iterativerefinement

23

Solve, iterative refinement33

Release internal memory for L and Umatrix number mnum

0

Release all internal memory for allmatrices

-1

INTEGERnNumber of equations. This is the number of equations inthe sparse linear systems of equations A*X = B. Constraint:n > 0.

REAL/COMPLEXaArray. Contains the nonzero values of the coefficient matrixA corresponding to the indices in ja. The size of a is thesame as that of ja and the coefficient matrix can be eitherreal or complex. The matrix must be stored in compressedsparse row format with increasing values of ja for each row.Refer to values array description in Sparse Matrix StorageFormat for more details.

2125


NOTE. The nonzeros of each row of the matrix Amust be stored in increasing order. For symmetricor structural symmetric matrices it is also importantthat the diagonal elements are also available andstored in the matrix. If the matrix is symmetric, thenthe array a is only accessed in the factorizationphase, in the triangular solution and iterativerefinement phase. Unsymmetric matrices areaccessed in all phases of the solution process.

INTEGERia

Array, dimension (n+1). For i≤n, ia(i) points to the firstcolumn index of row i in the array ja in compressed sparserow format. That is, ia(i) gives the index of the elementin array a that contains the first non-zero element from rowi of A. The last element ia(n+1) is taken to be equal tothe number of non-zeros in A, plus one. Refer to rowIndexarray description in Sparse Matrix Storage Format for moredetails.The array ia is also accessed in all phases of thesolution process. Note that the row and columns numbersstart from 1.

INTEGERjaArray. ja(*) contains column indices of the sparse matrixA stored in compressed sparse row format. The indices ineach row must be sorted in increasing order.The array jais also accessed in all phases of the solution process. Forsymmetric and structurally symmetric matrices it is assumedthat zero diagonal elements are also stored in the list ofnonzeros in a and ja. For symmetric matrices, the solverneeds only the upper triangular part of the system as isshown for columns array in Sparse Matrix Storage Format.

INTEGERpermArray, dimension (n). Holds the permutation vector of sizen. The array perm is defined as follows. Let A be the originalmatrix and B = P*A*PT be the permuted matrix. Row

2126


(column) i of A is the perm(i) row (column) of B. Thenumbering of the array must start by 1 and it must describea permutation.On entry, you can apply your own fill-in reducing orderingto the solver. The permutation vector perm is only accessedif iparm(5) = 1.

INTEGERnrhsNumber of right-hand sides that need to be solved for.

INTEGERiparmArray, dimension (64). This array is used to pass variousparameters to PARDISO and to return some usefulinformation after the execution of the solver. If iparm(1)= 0, then PARDISO fills iparm(1), and iparm(4) throughiparm(64)with default values and uses them. Note thatthere is no default values for iparm(3) and this value mustalways be supplied by the user, whether iparm(1) is 0 or1.Individual components of the iparm array are describedbelow (some of them are described in the Output Parameterssection).iparm(1)- use default values.If iparm(1) = 0, then iparm(2) and iparm(4) throughiparm(64) are filled with default values, otherwise the userhas to supply all values in iparm from iparm(2) toiparm(64).iparm(2) - fill-in reducing ordering.iparm(2) controls the fill-in reducing ordering for the inputmatrix. If iparm(2) is 0, then the minimum degreealgorithm is applied [Li99], if iparm(2) is 2, the solver usesthe nested dissection algorithm from the METIS package[Karypis98]. The default value of iparm(2) is 2.iparm(3)- number of processors.iparm(3) must contain the number of processors that areavailable for the parallel execution. The number must beequal to the OpenMP environment variableOMP_NUM_THREADS.

2127


CAUTION. If the user has not explicitly setOMP_NUM_THREADS, then this value can be set by theoperating system to the maximal numbers ofprocessors on the system. It is therefore alwaysrecommended to control the parallel execution of thesolver by explicitly setting OMP_NUM_THREADS. If lessprocessors are available than specified, the executionmay slow down instead of speeding up.

There is no default value for iparm(3).iparm(4) - preconditioned CGS.This parameter controls preconditioned CGS [Sonn89] forunsymmetric or structural symmetric matrices andConjugate-Gradients for symmetric matrices. iparm(4) hasthe form

iparm(4)= 10*L+K

The K and L values have the meanings as follow.

DescriptionValue of K

The factorization is always computed asrequired by phase.

0

CGS iteration replaces the computationof LU. The preconditioner is LU that wascomputed at a previous step (the first

1

step or last step with a failure) in asequence of solutions needed foridentical sparsity patterns.CG iteration for symmetric matricesreplaces the computation of LU. Thepreconditioner is LU that was computed

2

at a previous step (the first step or laststep with a failure) in a sequence ofsolutions needed for identical sparsitypatterns.

Value L:The value L controls the stopping criterion of theKrylow-Subspace iteration:

2128


epsCGS = 10-L is used in the stopping criterion

||dxi|| / ||dx1|| < epsCGS

with ||dxi|| = ||inv(L*U)*ri|| and ri is the residuumat iteration i of the preconditioned Krylow-Subspaceiteration.Strategy: A maximum number of 150 iterations is fixed byexpecting that the iteration will converge before consuminghalf the factorization time. Intermediate convergence ratesand residuum excursions are checked and can terminatethe iteration process. If phase =23, then the factorizationfor a given A is automatically recomputed in these caseswhere the Krylow-Subspace iteration failed, and thecorresponding direct solution is returned. Otherwise thesolution from the preconditioned Krylow-Subspace iterationis returned. Using phase =33 results in an error message(error =4) if the stopping criteria for the Krylow-Subspaceiteration can not be reached. More information on the failurecan be obtained from iparm(20).The default is iparm(4)=0, and other values are onlyrecommended for an advanced user. iparm(4) must begreater or equal to zero.Examples:

Descriptioniparm(4)

LU-preconditioned CGS iteration with astopping criterion of 1.0E-3 forunsymmetric matrices

31

LU-preconditioned CGS iteration with astopping criterion of 1.0E-6 forunsymmetric matrices

61

LU-preconditioned CGS iteration with astopping criterion of 1.0E-6 forsymmetric matrices

62

iparm(5)- user permutation.This parameter controls whether the user supplied fill-inreducing permutation is used instead of the integratedmultiple-minimum degree or nested dissection algorithms.

2129

This option may be useful for testing reordering algorithmsor adapting the code to special applications problems (forinstance, to move zero diagonal elements to the endP*A*PT). For definition of the permutation, see descriptionof the perm parameter.The default value of iparm(5) is 0.iparm(6)- write solution on x.If iparm(6)is 0 (which is the default), then the array xcontains the solution and the value of b is not changed. Ifiparm(6) is 1, then the solver will store the solution on theright hand side b.Note that the array x is always used. The default value ofiparm(6) is 0.iparm(8)On entry to the solve and iterative refinement step,iparm(8)should be set to the maximum number of iterativerefinement steps that the solver will perform. The solverwill not perform more than the absolute value ofiparm(8)steps of iterative refinement and will stop theprocess if a satisfactory level of accuracy of the solution interms of backward error has been achieved.Note that if iparm(8)< 0, the accumulation of the residuumis using enhanced precision real and complex data types.Perturbed pivots result in iterative refinement (independentof iparm(8)=0) and the iteration number executed isreported on iparm(7).The solver will automatically perform two steps of iterativerefinements when perturbed pivots have been obtainedduring the numerical factorization and iparm(8) was equalto zero.The number of performed iterative refinement steps isreported on iparm(8).The default value for iparm(8) is 0.iparm(9)This parameter is reserved for future use. Its value mustbe set to 0.iparm(10)- pivoting perturbation.

2130

This parameter instructs PARDISO how to handle smallpivots or zero pivots for unsymmetric matrices (mtype =11or mtype =13) and symmetric matrices (mtype =-2, mtype=-4, or mtype =6). For these matrices the solver uses acomplete supernode pivoting approach. When thefactorization algorithm reaches a point where it cannotfactorize the supernodes with this pivoting strategy, it usesa pivoting perturbation strategy similar to [Li99],[Schenk04].The magnitude of the potential pivot is tested against aconstant threshold of

alpha = eps*||A2||inf,

where eps = 10(-iparm(10)), A2 = P*PMPS*Dr*A*Dc*P,

and ||A2||inf is the infinity norm of the scaled andpermuted matrix A. Any tiny pivots encountered duringelimination are set to the sign (lii)*eps*||A2||inf - thistrades off some numerical stability for the ability to keeppivots from getting too small. Small pivots are thereforeperturbed with eps = 10(-iparm(10)).The default value of iparm(10) is 13 and therefore eps =1.0E-13 for unsymmetric matrices (mtype =11 or mtype=13).The default value of iparm(10) is 8, and therefore eps =1.0E-8 for symmetric indefinite matrices (mtype =-2,mtype =-4, or mtype =6).iparm(11)- scaling vectors.PARDISO uses a maximum weight matching algorithm topermute large elements on the diagonal and to scale thematrix so that the diagonal elements are equal to 1 and theabsolute value of the off-diagonal entries are less or equalto 1. This scaling method is only applied to unsymmetricmatrices (mtype =11 or mtype =13). The scaling can alsobe used for symmetric indefinite matrices (mtype =-2,mtype =-4, or mtype =6) in case that symmetric weightedmatchings is applied (iparm(13)= 1).It is recommended to use iparm(11) = 1 (scaling) andiparm(13) = 1 (matchings) for highly indefinite symmetricmatrices, for example from interior point optimizations or

2131


saddle point problems. It is also very important to note thatthe user must provided in the analysis phase (phase=11)the numerical values of the matrix A in case of scalings andsymmetric weighted matchings.The default value of iparm(11) is 1 for unsymmetricmatrices (mtype =11 or mtype =13). The default value ofiparm(11) is 0 for symmetric indefinite matrices (mtype=-2, mtype =-4, or mtype =6).iparm(12)This parameter is reserved for future use. Its value mustbe set to 0.iparm(13) - improved accuracy using (non-)symmetricweighted matchings.PARDISO can use a maximum weighted matching algorithmto permute large elements close the diagonal. This strategyadds an additional level of reliability to our factorizationmethods and can be seen as a complement to the alternativeidea of using more complete pivoting techniques during thenumerical factorization.It is recommended to use iparm(11)=1 (scalings) andiparm(13)=1 (matchings) for highly indefinite symmetricmatrices, for example from interior point optimizations orsaddle point problems. It is also very important to note thatthe user must provided in the analysis phase (phase =11)the numerical values of the matrix A in case of scalings andsymmetric weighted matchings.The default value of iparm(13) is 1 for unsymmetricmatrices (mtype =11 or mtype =13). The default value ofiparm(13) is 0 for symmetric matrices (mtype =-2, mtype=-4, or mtype =6).iparm(18)The solver will report the numbers of nonzeros on the factorsif iparm(18)< 0 on entry.The default value of iparm(18) is -1.iparm(19)- MFlops of factorization.If iparm(19)< 0 on entry, the solver will report MFlop(1.0E6) that are necessary to factor the matrix A. This willincrease the reordering time.The default value of iparm(19) is 0.

2132

iparm(21)- pivoting for symmetric indefinite matrices.iparm(21)controls the pivoting method for sparsesymmetric indefinite matrices. If iparm(21) is 0, then 1x1diagonal pivoting is used. If iparm(21) is 1, then 1x1 and2x2 Bunch and Kaufman pivoting will be used within thefactorization process. It is also recommended to useiparm(11)=1 (scalings) and iparm(13)=1 (matchings)for highly indefinite symmetric matrices, for example frominterior point optimizations or saddle point problems. Bunchand Kaufman pivoting is available for matrices: mtype =-2,mtype =-4, or mtype =6.The default value of iparm(21)is 0.

INTEGERmsglvlMessage level information. If msglvl = 0 then PARDISOgenerates no output, if msglvl = 1 the solver printsstatistical information to the screen.

REAL/COMPLEXbArray, dimension (n,nrhs). On entry, contains the righthand side vector/matrix B. Note that b is only accessed inthe solution phase.

Output Parameters

This parameter contains internal address pointers.pt

On output, some iparm values will report useful information,for example, numbers of nonzeros in the factors, and so on.

iparm

iparm(7)- number of performed iterative refinement steps.The number of iterative refinement steps that are actuallyperformed during the solve step.iparm(14)- number of perturbed pivots.After factorization, iparm(14) contains the number ofperturbed pivots during the elimination process for mtype=11, mtype =13, mtype =-2, mtype =-4, or mtype =-6.iparm(15)- peak memory symbolic factorization.The parameter iparm(15) provides the user with the totalpeak memory in KBytes that the solver needed during theanalysis and symbolic factorization phase. This value is onlycomputed in phase 1.

2133


iparm(16)- permanent memory symbolic factorization.The parameter iparm(16) provides the user with thepermanent memory in KBytes that the solver needed fromthe analysis and symbolic factorization phase in thefactorization and solve phases. This value is only computedin phase 1.iparm(17)- memory numerical factorization and solution.The parameter iparm(17) provides the user with the totaldouble precision memory consumption (KBytes) of the solverfor the factorization and solve phases. This value is onlycomputed in phase 2.Note that the total peak memory solver consumption ismax(iparm(15), iparm(16)+iparm(17))iparm(18)- number of nonzeros in factors.The solver will report the numbers of nonzeros on the factorsif iparm(18) < 0 on entry.iparm(19)- MFlops of factorization.Number of operations in MFlop (1.0E6 operations) that arenecessary to factor the matrix A are returned to the user ifiparm(19) < 0 on entry.iparm(20)- CG/CGS diagnostics.The value is used to give CG/CGS diagnostics (for example,the number of iterations and cause of failure):If iparm(20)> 0, CGS succeeded, and the number ofiterations executed are reported in iparm(20).If iparm(20)< 0, iterations executed, but CG/CGS failed.The error report details in iparm(20) are of the form:iparm(20)= - it_cgs*10 - cgs_error.If phase is 23, then the factors L, U are recomputed for thematrix A and the error flag error is zero in case of asuccessful factorization. If phase is 33, then error = -4signals the failure.Description of cgs_error is given in the table below:

Descriptioncgs_error- fluctuations of the residuum are toolarge

1

- ||dxmax_it_cgs/2|| too large (slowconvergence)

2

2134

Descriptioncgs_error- stopping criterion not reached atmax_it_cgs

3

- perturbed pivots caused iterativerefinement

4

- factorization is too fast for this matrix.It is better to use the factorizationmethod with iparm(4)=0

5

iparm(22)- inertia: number of positive eigenvalues.The parameter iparm(22) reports the number of positiveeigenvalues for symmetric indefinite matrices.iparm(23)- inertia: number of negative eigenvalues.The parameter iparm(23) reports the number of negativeeigenvalues for symmetric indefinite matrices.iparm(24) to iparm(64)These parameters are reserved for future use. Their valuesmust be set to 0.

On output, the array is replaced with the solution ifiparm(6) = 1.

b

REAL/COMPLEXxArray, dimension (n,nrhs). On output, contains solution ifiparm(6)=0. Note that x is only accessed in the solutionphase.

INTEGERerrorThe error indicator according to the below table:

Informationerror

no error0

input inconsistent-1

not enough memory-2

reordering problem-3

zero pivot, numerical factorization oriterative refinement problem

-4

unclassified (internal) error-5

preordering failed (matrix types 11, 13only)

-6

2135


Informationerror

diagonal matrix problem-7

Direct Sparse Solver (DSS) Interface RoutinesThe Intel MKL supports an alternative to PARDISO interface for the direct sparse solver referredto here as DSS interface. The DSS interface implements a group of user-callable routines thatare used in the step-by-step solving process and exploits the general scheme described inLinear Solvers Basics for solving sparse systems of linear equations. This interface also includesone routine for gathering statistics related to the solving process and an auxiliary routine forpassing character strings from Fortran routines to C routines.

The solving process is conceptually divided into six phases, as shown in Table 8-1 which liststhe names of the routines, grouped by phase, and describes their general use.

Table 8-1 DSS Interface Routines

DescriptionRoutine

Initializes the solver and creates the basic datastructures necessary for the solver. This routinemust be called before any other DSS routine.

dss_create

Used to inform the solver of the locations of thenon-zero elements of the array.

dss_define_structure

Based on the non-zero structure of the matrix, thisroutine computes a permutation vector to reducefill-in during the factoring process.

dss_reorder

Computes the LU, LDTt or LLT factorization of a realor complex matrix.

dss_factor_real,dss_factor_complex

Computes the solution vector for a system ofequations based on the factorization computed bythe previous phase.

dss_solve_real, dss_solve_complex

Deletes all of the data structures created during thesolutions process.

dss_delete

Returns statistics about various phases of the solvingprocess. Used to gather statistics in the followingareas: time taken to do reordering, time taken to

dss_statistics

2136


DescriptionRoutine

do factorization, problem solving duration,determinant of a matrix, inertia of a matrix, andnumber of floating point operations taken duringfactorization. Can be invoked at any phase of thesolving process after the “reorder” phase, but beforethe “delete” phase. Note that appropriateargument(s) must be supplied to this routine tocorrespond to phase at which it is invoked.

Used to pass character strings from Fortran routinesto C routines.

mkl_cvt_to_null_terminated_str

To find a single solution vector for a single system of equations with a single right hand side,the Intel MKL DSS interface routines are invoked in the order in which they are listed in Table8-1 , with the exception of dss_statistics, which is invoked as described in the table.

However, in certain applications it is necessary to produce solution vectors for multiple right-handsides for a given factorization and/or factor several matrices with the same non-zero structure.Consequently, it is necessary to be able to invoke the Intel MKL sparse routines in an orderother than listed in the table. The following diagram in Figure 8-2 indicates the typical order(s)in which the DSS interface routines can be invoked.

Figure 8-2 Typical order for invoking DSS interface routines

2137


You can find example code that uses DSS interface routines to solve systems of linear equationsin Direct Sparse Solver Examples section in the appendix.

DSS Interface Description

As noted in Memory Allocation and Handles section, each DSS routine either reads or writesan opaque data object called a handle. Because the declaration of a handle varies from languageto language, it is declared as being of type MKL_DSS_HANDLE in this documentation. You canrefer to Memory Allocation and Handles to determine the correct method for declaring a handleargument.

All other types in this documentation refer to the standard Fortran types, INTEGER, REAL,COMPLEX, DOUBLE PRECISION, and DOUBLE COMPLEX.

C and C++ programmers should refer to Calling Direct Sparse Solver Routines From C/C++for information on mapping Fortran types to C/C++ types.

Routine Options

All of the DSS routines have an integer argument (below referred to as opt) for passing variousoptions to the routines. The permissible values for opt should be specified using only the symbolconstants defined in the language-specific header files (see Implementation Details). All of theroutines accept options for setting the message and termination level as described in Table 8-2. Additionally, all routines accept the option MKL_DSS_DEFAULTS, which establishes thedocumented default options for each DSS routine.

Table 8-2 Symbolic Names for the Message and Termination Level Options

Termination LevelMessage Level

MKL_DSS_TERM_LVL_SUCCESSMKL_DSS_MSG_LVL_SUCCESS

MKL_DSS_TERM_LVL_INFOMKL_DSS_MSG_LVL_INFO

MKL_DSS_TERM_LVL_WARNINGMKL_DSS_MSG_LVL_WARNING

MKL_DSS_TERM_LVL_ERRORMKL_DSS_MSG_LVL_ERROR

MKL_DSS_TERM_LVL_FATALMKL_DSS_MSG_LVL_FATAL

The settings for message and termination level can be set on any call to a DSS routine. However,once set to a particular level, they remain at that level until they are changed in another callto a DSS routine.

Users can specify multiple options to a DSS routine by adding the options together. For example,to set the message level to debug and the termination level to error for all DSS routines, usethe call:

CALL dss_create( handle, MKL_DSS_MSG_LVL_INFO + MKL_DSS_TERM_LVL_ERROR)

2138


User Data Arrays

Many of the DSS routines take arrays of user data as input. For example, user arrays are passedto the routine dss_define_structure to describe the location of the non-zero entries in thematrix. In order to minimize storage requirements and improve overall run-time efficiency, theIntel MKL DSS routines do not make copies of the user input arrays.

WARNING. Users cannot modify the contents of these arrays after they are passed toone of the solver routines.

dss_createInitializes the solver.

Syntax

dss_create(handle, opt)

Input Parameters

INTEGER Options passing argument. The default value isMKL_DSS_MSG_LVL_WARNING+ MKL_DSS_TERM_LVL_ERROR.

opt

Output Parameters

Data object of MKL_DSS_HANDLE type (see InterfaceDescription).

handle

Description

The routine dss_create is called to initialize the solver. After the call to dss_create, allsubsequent invocations of Intel MKL DSS routines should use the value of handle returned bydss_create.

WARNING. Do not write the value of handle directly.

Return Values

MKL_DSS_SUCCESS

2139


MKL_DSS_INVALID_OPTION

MKL_DSS_OUT_OF_MEMORY

dss_define_structureCommunicates to the solver locations of non-zeroelements in the matrix.

Syntax

dss_define_structure(handle, opt, rowIndex, nRows, nCols, columns, nNonZeros);

Input Parameters

INTEGER. Option passing argument. The default option forthe matrix structure is MKL_DSS_SYMMETRIC.

opt

INTEGER. Array of size min(nRows, nCols)+1. Defines thelocation of non-zero entries in the matrix.

rowIndex

INTEGER. Number of rows in the matrix.nRows

INTEGER. Number of columns in the matrix.nCols

INTEGER. Array of size nNonZeros. Defines the location ofnon-zero entries in the matrix.

columns

INTEGER. Number of non-zero elements in the matrix.nNonZeros

Output Parameters


handle

Description

The routine dss_define_structure communicates to the solver the locations of the nNonZerosnumber of non-zero elements in a matrix of size nRows by nCols.

Note that currently Intel MKL DSS software only operates on square matrices, so nRows mustbe equal to nCols.

To communicate the locations of non-zeros in the matrix, do the following:

1. Define the general non-zero structure of the matrix by specifying one of the following valuesfor the options argument opt:

2140


MKL_DSS_SYMMETRIC_STRUCTURE•

• MKL_DSS_SYMMETRIC

• MKL_DSS_NON_SYMMETRIC

2. Provide the actual locations of the non-zeros by means of the arrays rowIndex and columns(see Sparse Matrix Storage Format).

Return Values

MKL_DSS_SUCCESS

MKL_DSS_STATE_ERR


MKL_DSS_COL_ERR

MKL_DSS_NOT_SQUARE

MKL_DSS_TOO_FEW_VALUES

MKL_DSS_TOO_MANY_VALUES

dss_reorderComputes permutation vector that minimizes thefill-in during the factorization phase.

Syntax

dss_reorder(handle, opt, perm)

Input Parameters

INTEGER. Option passing argument. The default option forthe permutation type is MKL_DSS_AUTO_ORDER.

opt

INTEGER. Array of length nRows. Contains a user-definedpermutation vector (accessed only if opt containsMKL_DSS_MY_ORDER).

perm

2141


Output Parameters


handle

Description

If opt contains the options MKL_DSS_AUTO_ORDER, then dss_reorder computes a permutationvector that minimizes the fill-in during the factorization phase. For this option, the perm arrayis never accessed.

If opt contains the option MKL_DSS_MY_ORDER, then the array perm is considered to be apermutation vector supplied by the user. In this case, the array perm is of length nRows, wherenRows is the number of rows in the matrix as defined by the previous call to dss_define_structure.

Return Values

MKL_DSS_SUCCESS

MKL_DSS_STATE_ERR



dss_factor_real, dss_factor_complexCompute the factorization of the matrix withpreviously specified location.

Syntax

dss_factor_real(handle, opt, rValues)

dss_factor_complex(handle, opt, cValues)

Input Parameters


handle

INTEGER Option passing argument. The default option forthe matrix type is MKL_DSS_POSITIVE_DEFINITE.

opt

DOUBLE PRECISION. Array of size nNonZeros. Contains realnon-zero elements of the matrix.

rValues

2142


DOUBLE COMPLEX. Array of size nNonZeros. Containscomplex non-zero elements of the matrix.

cValues

Description

These routines compute the factorization of the matrix whose non-zero locations were previouslyspecified by a call to dss_define_structure and whose non-zero values are given in the arrayrValues or cValues. The arrays rValues and cValues are assumed to be of length nNonZerosas defined in a previous call to dss_define_structure.

The opt argument should contain one of the following options:

• MKL_DSS_POSITIVE_DEFINITE,

• MKL_DSS_INDEFINITE,

• MKL_DSS_HERMITIAN_POSITIVE_DEFINITE,

• MKL_DSS_HERMITIAN_INDEFINITE ,

depending on whether the non-zero values in rValues and cValues describe a positive definite,indefinite, or Hermitian matrix.

Return Values

MKL_DSS_SUCCESS

MKL_DSS_STATE_ERR


MKL_DSS_OPTION_CONFLICT


MKL_DSS_ZERO_PIVOT

dss_solve_real, dss_solve_complexCompute the corresponding solutions vector andplace it in the output array.

Syntax

dss_solve_real(handle, opt, rRhsValues, nRhs, rSolValues)

dss_solve_complex(handle, opt, cRhsValues, nRhs, cSolValues)

2143


Input Parameters


handle

INTEGER. Option passing argument.opt

INTEGER. Number of the right-hand sides in the linearequation.

nRhs

DOUBLE PRECISION. Array of size nRows by nRhs. Containsreal right-hand side vectors.

rRhsValues

DOUBLE COMPLEX. Array of size nRows by nRhs. Containscomplex right-hand side vectors.

cRhsValues

Output Parameters

DOUBLE PRECISION. Array of size nCols by nRhs. Containsreal solution vectors.

rSolValues

DOUBLE COMPLEX. Array of size nCols by nRhs. Containscomplex solution vectors.

cSolValues

Description

For each right hand side column vector defined in ?RhsValues (where ? is one of r or c), theseroutines compute the corresponding solutions vector and place it in the array ?SolValues.

The lengths of the right-hand side and solution vectors, nCols and nRows respectively, areassumed to have been defined in a previous call to dss_define_structure.

Return Values

MKL_DSS_SUCCESS

MKL_DSS_STATE_ERR



2144


dss_deleteDeletes all of data structures created during thesolutions process.

Syntax

dss_delete(handle, opt)

Input Parameters

INTEGER. Options passing argument. The default value isMKL_DSS_MSG_LVL_WARNING + MKL_DSS_TERM_LVL_ERROR.

opt

Output Parameters


handle

Description

The routine dss_delete is called to delete all of the data structures created during the solutionsprocess.

Return Values

MKL_DSS_SUCCESS



dss_statisticsReturns statistics about various phases of thesolving process.

Syntax

dss_statistics(handle, opt, statArr, retValues)

2145


Input Parameters


handle

INTEGER Options passing argument.opt

STRING Input string that defines the type of the returnedstatistics. Can include one or more of the following stringconstants (case of the input string has no effect):

statArr

Amount of time taken to do thereordering.

ReorderTime

Amount of time taken to do thefactorization.

FactorTime

Amount of time taken to solve theproblem after factorization.

SolveTime

Determinant of the matrix A. For realmatrices, determinant is returned asdet_pow, det_base in two consecutive

Determinant

return array locations, where: 1.0

≤abs(det_base) < 10.0 anddeterminant = det_base*10(det_pow)

For complex matrices, determinant isreturned as det_pow, det_re, det_im inthree consecutive return array locations,

where: 1.0 ≤abs(det_re) +abs(det_im) < 10.0 and determinant= det_re, det_im*10(det_pow)

Inertia of a real symmetric matrix isdefined to be a triplet of nonnegativeintegers (p,n,z) where p is a number of

Inertia

positive eigenvalues, n is number ofnegative eigenvalues, and z is number ofzero eigenvalues.Inertia will be returned as threeconsecutive return array locations asp,n,z.

2146

Computing Inertia is only recommendedfor stable matrices. Unstable matrices canlead to incorrect results.Inertia of a k by k real symmetricpositive definite matrix is always(k,0,0). Therefore Inertia is returnedonly in cases of real symmetric indefinitematrices. For all other matrix types, anerror message is returned.

Number of floating point operationsperformed during factorization.

Flops

NOTE. To avoid problems in passing strings fromFortran to C, Fortran users must call themkl_cvt_to_null_terminated_str routine beforecalling dss_statistics. Refer to the description ofmkl_cvt_to_null_terminated_str for details.

Output Parameters

DOUBLE PRECISION Value of the statistics returned.retValues

Description

The dss_statistics routine returns statistics about various phases of the solving process.Use this routine to gather statistics in the following areas:

– time taken to do reordering,

– time taken to do factorization,

– problem solving duration,

– determinant of a matrix,

– inertia of a matrix,

– number of floating point operations taken during factorization.

Statistics are returned corresponding to the specified input string. The value of the statisticsis returned in double precision in a return array allocated by user.

2147


For multiple statistics, string constants separated by commas can be used as input. Returnvalues are put into the return array in the same order as specified in the input string.

Statistics should only be requested at appropriate stages of the solving process. For example,inquiring about FactorTime before a matrix is factored will lead to errors.

The following table shows the point at which each statistic can be called:

Table 8-3 Statistics Calling Sequences

When to CallType of Statistics

After dss_reorder is completed successfully.ReorderTime

After dss_factor_real or dss_factor_complex is completed successfully.FactorTime

After dss_solve_real or dss_solve_complex is completed successfully.SolveTime

After dss_factor_real or dss_factor_complex is completed successfully.Determinant

After dss_factor_real is completed successfully and matrix is real, symmetric,

and indefinite.

Inertia

After dss_factor_real or dss_factor_complex is completed successfully.Flops

Finding “time used to reorder” and “inertia” of a matrix.

The example below illustrates the use of the dss_statistics routine.

To find these values, call dss_statistics(handle, opt, statArr, retValues), where statArris “ReorderTime,Inertia”

In this example, retValues will have the following values:

Return Values

MKL_DSS_SUCCESS

MKL_DSS_STATISTICS_INVALID_MATRIX

MKL_DSS_STATISTICS_INVALID_STATE

MKL_DSS_STATISTICS_INVALID_STRING

2148


mkl_cvt_to_null_terminated_strPasses character strings from Fortran routines toC routines.

Syntax

mkl_cvt_to_null_terminated_str (destStr, destLen, srcStr)

Input Parameters

INTEGER. Length of the output array destStr.destLen

STRING. Input string.srcStr

Output Parameters

INTEGER. One-dimensional array of integer.destStr

Description

The routine mkl_cvt_to_null_terminated_str is used to pass character strings from Fortranroutines to C routines. The strings are converted into integer arrays before being passed to C.Using this routine avoids the problems that can occur on some platforms when passing stringsfrom Fortran to C. The use of this routine is highly recommended.

Implementation Details

Several aspects of the Intel MKL DSS interface are platform-specific and language-specific. Inorder to promote portability across platforms and ease of use across different languages, usersare encouraged to include one of the Intel MKL DSS language-specific header files. Currently,there are three language specific header files:

• mkl_dss.f77 for F77 programs

• mkl_dss.f90 for F90 programs

• mkl_dss.h for C programs

These language-specific header files define symbolic constants for error returns, function options,certain defined data types, and function prototypes.

2149


NOTE. It is strongly recommended that you refer to the constants for options, errorreturns, and message severities only by the symbolic names that are defined in theheader files. Use of the Intel MKL DSS software without including one of the above headerfiles is not supported.

Memory Allocation and Handles

In order to make the Intel MKL DSS routines as easy to use as possible, the routines do notrequire the user to allocate any temporary working storage. Any storage required by the solver(that is not a user input) is allocated by the solver itself. In order to allow multiple users toaccess the solver simultaneously, the solver keeps track of the storage allocated for a particularapplication by using an opaque data object called a handle.

Each of the Intel MKL DSS routines either creates, uses or deletes a handle. Consequently,user programs must be able to allocate storage for a handle. The exact syntax for allocatingstorage for a handle varies from language to language. To help standardize the handledeclarations, the language-specific header files declare constants and defined data types thatshould be used when declaring a handle object in user code.

Fortran 90 programmers should declare a handle as:

INCLUDE "mkl_dss.f90"

TYPE(MKL_DSS_HANDLE) handle

C and C++ programmers should declare a handle as:

#include "mkl_dss.h"

_MKL_DSS_HANDLE_t handle;

Fortran 77 programmers using compilers that support eight byte integers, should declare ahandle as:

INCLUDE "mkl_dss.f77"

INTEGER*8 handle

Otherwise they should replace INTEGER*8 with DOUBLE PRECISION.

In addition to the necessary definition for the correct declaration of a handle, the include filealso defines the following:

• function prototypes for languages that support prototypes

• symbolic constants that are used for the error returns

• user options for the solver routines

2150


• message severity

Iterative Sparse Solvers basedonReverse CommunicationInterface (RCI ISS)

The Intel MKL supports an additional to PARDISO interface, namely, the iterative sparse solvers(ISS) based on reverse communication interface (RCI) referred to here as RCI ISS interface.The RCI ISS interface implements a group of user-callable routines that are used in thestep-by-step solving process of a symmetric positive definite system (RCI Conjugate GradientSolver, or RCI CG), and of a non-symmetric indefinite (non-degenerate) system (RCI FlexibleGeneralized Minimal RESidual Solver, or RCI FGMRES) of linear algebraic equations and exploitsthe general RCI scheme described in [Dong95]. The terms and concepts required to understandthe use of the Intel MKL RCI ISS subroutines are discussed in the Linear Solvers Basics. RCImeans that user himself must perform certain operations for the solver (for example,matrix-vector multiplications). When the solver needs the results of such operations, the usermust pass them to the solver. This gives the great universality to the solver as it is independentof the specific implementation of the operations like the matrix-vector multiplication. However,this approach requires some additional work from the user. To simplify this task, the user canuse the built-in sparse matrix-vector multiplications and triangular solvers routines (see SparseBLAS Level 2 and Level 3.

NOTE. The RCI CG solver is implemented in two versions: for system of equations withsingle right hand side, and for system of equations with multiple right hand sides.

The CG method may fail to compute the solution or compute the wrong solution if thematrix of the system is not symmetric and positive definite.

The FGMRES method may fail if the matrix is degenerate.

The solving process is conceptually divided into four steps, as shown in the Table 8-4 , thatlists the names of the routines, and describes their general use.

Table 8-4 RCI ISS Interface Routines

DescriptionRoutine

Initializes the solver.dcg_init, dcgmrhs_init,dfgmres_init

Checks the consistency and correctness of the user defineddata.dcg_check, dcgmrhs_check,

dfgmres_check

2151


DescriptionRoutine

Computes the approximate solution vector.dcg, dcgmrhs, dfgmres

Retrieves the number of the current iteration.dcg_get, dcgmrhs_get, dfgmres_get

The Intel MKL RCI ISS interface routines are normally invoked in the order in which they arelisted in Table8-4 , with the exception of dcg_get, dcgmrhs_get, and dfgmres_get routinesthat can be invoked at any place in the code. However, in this case some precautions shouldbe taken to avoid the wrong results. Advanced users can change that order if they need it. Forothers it is strongly recommended to follow the above order of calls.

The following diagram in Figure 8-3 indicates the typical order in which the RCI ISS interfaceroutines can be invoked.

Figure 8-3 Typical Order for Invoking RCI ISS interface Routines

2152


Figure 8-4 and Figure 8-5 show the general schemes of using the RCI CG and RCI FGMRESroutines respectively.

Figure 8-4 General Scheme of Using RCI CG Routines

...

generate matrix A

generate preconditioner C (optional)

call dcg_init(n, x, b, RCI_request, ipar, dpar, tmp)

change parameters in ipar, dpar if necessary

call dcg_check(n, x, b, RCI_request, ipar, dpar, tmp)

1 call dcg(n, x, b, RCI_request, ipar, dpar, tmp)

if (RCI_request.eq.1) then

multiply the matrix A by tmp(1:n,1) and put the result in tmp(1:n,2)

It is possible to use MKL Sparse BLAS Level 2 subroutines for this operation

c proceed with CG iterations

goto 1

endif

if (RCI_request.eq.2)then

do the stopping test

if (test not passed) then


go to 1

else

c stop CG iterations

goto 2

endif

endif

if (RCI_request.eq.3) then (optional)

apply the preconditioner C inverse to tmp(1:n,3) and put the result in tmp(1:n,4)

2153



goto 1

end

2 call dcg_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)

current iteration number is in itercount

the computed approximation is in the array x

Figure 8-5 General Scheme of Using RCI FGMRES Routines

...

generate matrix A


call dfgmres_init(n, x, b, RCI_request, ipar, dpar, tmp)


call dfgmres_check(n, x, b, RCI_request, ipar, dpar, tmp)

1 call dfgmres(n, x, b, RCI_request, ipar, dpar, tmp)


multiply the matrix A by tmp(ipar(22)) and put the result in tmp(ipar(23))

It is possible to use MKL Sparse BLAS Level 2 subroutines for this operation

c proceed with FGMRES iterations

goto 1

endif





go to 1

else

c stop FGMRES iterations

goto 2

2154


endif

endif

if (RCI_request.eq.3) then (optional)

apply the preconditioner C inverse to tmp(ipar(22)) and put the result in tmp(ipar(23))


goto 1

endif


check the norm of the next orthogonal vector, it is contained in dpar(7)

if (the norm is not zero up to rounding/computational errors) then


goto 1

else


goto 2

endif

endif

2 call dfgmres_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)



Note that for the FGMRES method the array x initially contains the current initial approximationto the solution that can be updated only by calling the dfgmres_get routine that updates thesolution in accordance with the computations performed by the dfgmres routine.

The pseudo codes in these figures demonstrate two main differences in use of RCI CG and RCIFGMRES interfaces. The first difference relates to the RCI_request=3 (different locations inthe tmp array which is 2-dimensional for CG and 1-dimensional for FGMRES), the seconddifference relates to RCI_request=4 (RCI CG interface never produces RCI_request=4).

You can find example codes that use RCI ISS interface routines to solve systems of linearequations in the Iterative Sparse Solver Code Example section in the Appendix C.

2155


CG Interface Description

All types in this documentation refer to the standard Fortran types, INTEGER, and DOUBLEPRECISION.

C and C++ programmers should refer to the section Calling Sparse Solver Routines From C/C++for information on mapping Fortran types to C/C++ types.

Each routine for the RCI CG solver is implemented in two versions: for system of equationswith single right hand side (SRHS), and for system of equations with multiple right hand sides(MRHS). The routines for the system with MRHS contain the suffix mrhs in their names.

NOTE. The routines for the system with MRHS can be used with the names withoutsuffix mrhs. To do this, the user must switch on the compiler's preprocessor and includethe files mkl_solver.h for C/C++, or mkl_solver.f77 for FORTRAN.

Routines Options

All of the RCI CG routines have parameters for passing various options to the routines. Thevalues for these parameters should be specified very carefully (see CG Common Parameters),and they can be changed during computations according to the user's needs.

NOTE. Users must provide correct and consistent parameters to the subroutines toavoid fails or wrong results.

User Data Arrays

Many of the RCI CG routines take arrays of user data as input. For example, user arrays arepassed to the routine dcg to compute the solution of a system of linear algebraic equations.The Intel MKL RCI CG routines do not make copies of the user input arrays to minimize storagerequirements and improve overall run-time efficiency.

CG Common Parameters

NOTE. The default and initial values listed below are assigned to the parameters by thecalling the dcg_init/dcgmrhs_init routine.

2156


- INTEGER, this parameter sets the size of the problem in thedcg_init/dcgmrhs_init routine. All other routines uses ipar(1)parameter instead.

n

- DOUBLE PRECISION array of size n for SRHS, or matrix of sizen by nrhs for MRHS. This parameter contains the currentapproximation to the solution. Before the first call to thedcg/dcgmrhs routine, it contains the initial approximation to thesolution.

x

- INTEGER, this parameter sets the number of right-hand sides forMRHS routines.

nrhs

- DOUBLE PRECISION array containing the single right-hand sidevector, or matrix (nrhs, n)containing the right-hand side vectors.

b

- INTEGER, this parameter is used to inform about the result ofwork of the RCI CG routines. The negative values of the parameterindicate that the routine is completed with errors or warnings. The

RCI_request

0 value indicates the successful completion of the task. The positivevalues mean that the user must perform certain actions,specifically:

- multiply the matrix by tmp(1:n,1), put theresult in tmp(1:n,2), and return the controlto the dcg/dcgmrhs routine;

RCI_request= 1

- perform the stopping test(s). If they fail,return the control to the dcg/dcgmrhs routine.Otherwise, the solution is found and stored inthe x;

RCI_request= 2

- for SRHS: apply the preconditioner totmp(1:n,3), put the result in tmp(1:n,4),and return the control to the dcg routine;

RCI_request= 3

- for MRHS: apply the preconditioner totmp(:,3+ipar(3)), put the result intmp(:,3), and return the control to thedcgmrhs routine.

Note that the dcg_get/dcgmrhs_get routine does not change theparameter RCI_request. This allows user to use this routine insidethe Reverse Communication computations.

2157


- INTEGER array, of size 128 for SRHS, and of size(128+2*nrhs)for MRHS ; this parameter is used to specify theinteger set of data for the RCI CG computations:

ipar

- specifies the size of the problem. Thedcg_init/dcgmrhs_init routine assignesipar(1)=n. All other routines uses thisparameter instead of n. There is no defaultvalue for this parameter.

ipar(1)

- specifies the type of output for error andwarning messages that are generated by theRCI CG routines. The default value 6 means

ipar(2)

that all messages are displayed on the screen.Otherwise the error and warning messagesare written to the newly created filesdcg_errors.txt and dcg_check_warnings.txtrespectively. Note that if ipar(6) andipar(7) parameters are set to 0, error andwarning messages are not generated at all.

- for SRHS: contains the current stage of theRCI CG computations, the initial value is 1;

ipar(3)

- for MRHS: contains the right-hand side forwhich the calculations are currently performed.

NOTE. It is highly non-recommendedto alter this variable duringcomputations.

- contains the current iteration number, theinitial value is 0.

ipar(4)

- specifies the maximum number of iterations,the default value is min{150,n} .

ipar(5)

- if the value is not equal to 0, the routinesoutput error messages in accordance with theparameter ipar(2). Otherwise, the routines

ipar(6)

2158


do not output error messages at all, but theyreturn a negative value of the parameterRCI_request. The default value is 1.

- if the value is not equal to 0, the routinesoutput warning messages in accordance withthe parameter ipar(2). Otherwise, the

ipar(7)

routines do not output warning messages atall, but they return a negative value of theparameter RCI_request. The default value is1.

- if the value is not equal to 0, thedcg/dcgmrhs routine performs the stoppingtest for the maximum number of iterations,

ipar(8)

namely, ipar(4)≤ipar(5). Otherwise, themethod is stopped and corresponding value isassigned to the RCI_request. If the value is0, the routine does not perform this stoppingtest. The default value is 1.

- if the value is not equal to 0, thedcg/dcgmrhs routine performs the residualstopping test, namely,

ipar(9)

dpar(5)≤dpar(4)=dpar(1)*dpar(3)+dpar(2).Otherwise, the method is stopped andcorresponding value is assigned to theRCI_request. If the value is 0, the routinedoes not perform this stopping test. Thedefault value is 0.

- if the value is not equal to 0, thedcg/dcgmrhs routine requests for the userdefined stopping test by setting

ipar(10)

RCI_request=2. If the value is 0, the routinedoes not perform the user defined stoppingtest. The default value is 1.

NOTE. At least one of the parametersipar(8)-ipar(10) must be set to 1.

2159


- if the value is equal to 0, the dcg/dcgmrhsroutine runs the non-preconditioned versionof the corresponding Conjugate Gradient

ipar(11),

method. Otherwise, the routine runs thepreconditioned version of the ConjugateGradient method, and asks the user to performthe preconditioning step by setting theparameter RCI_request=3. The default valueis 0.

are reserved and not used in the current RCICG SRHS and MRHS routines respectively.

ipar(11:128),ipar(11:128+2*nrhs)

NOTE. Advanced users can define thearray in the code using RCI CG SRHS asfollows: INTEGER ipar(11). However,to guarantee the compatibility with thefuture releases of the Intel MKL it ishighly recommended to declare the arrayipar of length 128.

- DOUBLE PRECISION array, for SRHS of size 128, for MRHS ofsize (128+2*nrhs); this parameter is used to specify the doubleprecision set of data for the RCI CG computations, specifically:

dpar

- specifies the relative tolerance, the defaultvalue is 1.0D-6;

dpar(1)

- specifies the absolute tolerance, the defaultvalue is 0.0D-0;

dpar(2)

- specifies the square norm of initial residual(if it is computed in the dcg/dcgmrhs routine),the initial value is 0;

dpar(3)

- service variable, it is equal todpar(1)*dpar(3)+dpar(2) (if it is computedin the dcg/dcgmrhs routine), the initial valueis 0;

dpar(4)

- specifies the square norm of current residual,the initial value is 0.0;

dpar(5)

2160


- specifies the square norm of residual fromthe previous iteration step (if available), theinitial value is 0.0;

dpar(6)

- contains the "alpha" parameter of the CGmethod, the initial value is 0.0;

dpar(7)

- contains the "beta" parameter of the CGmethod, it is equal to dpar(5)/dpar(6), theinitial value is 0.0;

dpar(8)

are reserved and not used in the current RCICG SRHS and MRHS routines respectively.

dpar(9:128),dpar(9:128+2*nrhs)

NOTE. Advanced users can define thisarray in the code using RCI CG SRHS asfollows: DOUBLE PRECISION dpar(8).However, to guarantee the compatibilitywith the future releases of the Intel MKLit is highly recommended to declare thearray dpar of length 128.

- DOUBLE PRECISION array, for SRHS of size (n,4), for MRHS ofsize (n,3+nrhs); this parameter is used to supply the doubleprecision temporary space for the RCI CG computations,specifically:

tmp

- specifies the current search direction. Theinitial value is 0.0;

tmp(:,1)

- contains the matrix multiplied by the currentsearch direction. The initial value is 0.0;

tmp(:,2)

- contains the current residual. The initial valueis 0.0;

tmp(:,3)

- contains the inverse of the preconditionerapplied to the current residual. There is noinitial value for this parameter.

tmp(:,4)

2161


NOTE. Advanced users can define thisarray in the code using RCI CG SRHS asDOUBLE PRECISIONtmp(n,3) if they runonly non-preconditioned CG iterations.

FGMRES Interface Description



Routines Options

All of the RCI FGMRES routines have parameters for passing various options to the routines.The values for these parameters should be specified very carefully (see FGMRES CommonParameters), and they can be changed during computations according to the user's needs.

NOTE. Users must provide correct and consistent parameters to the subroutines toavoid fails or wrong results.

User Data Arrays

Many of the RCI FGMRES routines take arrays of user data as input. For example, user arraysare passed to the routine>dfgmres to compute the solution of a system of linear algebraicequations. In order to minimize storage requirements and improve overall run-time efficiency,the Intel MKL RCI FGMRES routines do not make copies of the user input arrays.

FGMRES Common Parameters

NOTE. The default and initial values listed below are assigned to the parameters by thecalling thedfgmres_init routine.

2162


- INTEGER, this parameter sets the size of the problem in thedfgmres_init routine. All other routines uses ipar(1) parameterinstead.

n

- DOUBLE PRECISION array, this parameter contains the currentapproximation to the solution vector. Before the first call to thedfgmres routine, it contains the initial approximation to thesolution vector.

x

- DOUBLE PRECISION array, this parameter contains the right-handside vector. Depending on user requests, it may contain theapproximate solution later.

b

- INTEGER, this parameter is used to inform about the result ofwork of the RCI FGMRES routines. The negative values of theparameter indicate that the routine is completed with errors or

RCI_request

warnings. The 0 value indicates the successful completion of thetask. The positive values mean that the user must perform certainactions, specifically:

- multiply the matrix by tmp(ipar(22)), putthe result in tmp(ipar(23)), and return thecontrol to the dfgmres routine;

RCI_request= 1

- perform the stopping test(s). If they fail,return the control to the dfgres routine.Otherwise, the solution can be updated by asubsequent call to dfgmres_get routine;

RCI_request= 2

- apply the preconditioner to tmp(ipar(22)),put the result in tmp(ipar(23)), and returnthe control to the dfgmres routine.

RCI_request= 3

- check if the norm of the current orthogonalvector is not zero up torounding/computational errors. Return the

RCI_request= 4

control to the dfgmres routine if it is not zero,otherwise complete the solution process bycalling dfgmres_get routine.

- INTEGER array, this parameter is used to specify the integer setof data for the RCI FGMRES computations:

ipar(128)

2163


- specifies the size of the problem. Thedfgmres_init routine assignes ipar(1)=n.All other routines uses this parameter insteadof n. There is no default value for thisparameter.

ipar(1)

- specifies the type of output for error andwarning messages that are generated by theRCI FGMRES routines. The default value 6

ipar(2)

means that all messages are displayed on thescreen. Otherwise the error and warningmessages are written to the newly created fileMKL_RCI_FGMRES_Log.txt. Note that ifipar(6) and ipar(7) parameters are set to0, error and warning messages are notgenerated at all.

- contains the current stage of the RCIFGMRES computations, the initial value is 1.

ipar(3)


- contains the current iteration number, theinitial value is 0.

ipar(4)

- specifies the maximum number of iterations,the default value is min {150,n}.

ipar(5)

- if the value is not equal to 0, the routinesoutput error messages in accordance with theparameter ipar(2). Otherwise, the routines

ipar(6)

do not output error messages at all, but theyreturn a negative value of the parameterRCI_request. The default value is 1.

- if the value is not equal to 0, the routinesoutput warning messages in accordance withthe parameter ipar(2). Otherwise, the

ipar(7)

routines do not output warning messages at

2164


all, but they return a negative value of theparameter RCI_request. The default value is1.

- if the value is not equal to 0, the dfmresroutine performs the stopping test for themaximum number of iterations, namely,

ipar(8)

ipar(4)≤ipar(5). Otherwise, the method isstopped and corresponding value is assignedto the RCI_request. If the value is 0, thedfgmres routine does not perform thisstopping test. The default value is 1.

- if the value is not equal to 0, the dfgmresroutine performs the residual stopping test,

namely, dpar(5)≤dpar(4) =

ipar(9)

dpar(1)·dpar(3)+dpar(2). If the criterionis fulfilled, the method is stopped andcorresponding value is assigned to theRCI_request. If the value is 0, the dfgmresroutine does not perform this stopping test.The default value is 0.

- if the value is not equal to 0, the dfgmresroutine requests for the user defined stoppingtest by setting RCI_request=2. If the value

ipar(10)

is 0, the dfgmres routine does not performthe user defined stopping test. The defaultvalue is 1.

NOTE. At least one of the parametersipar(8)-ipar(10) must be set to 1.

- if the value is equal to 0, the dfgmresroutine runs the non-preconditioned versionof the FGMRES method. Otherwise, the routine

ipar(11)

runs the preconditioned version of the FGMRES

2165


method, and asks the user to perform thepreconditioning step by setting the parameterRCI_request=3. The default value is 0.

- if the value is not equal to 0, the dfgmresroutine performs the automatic test for zeronorm of the currently generated vector,

ipar(12)

namely, dpar(7)≤dpar(8), where dpar(8)contains the tolerance value. Otherwise, theroutine asks the user to perform this check bysetting the parameter RCI_request=4. Thedefault value is 0.

- if the value is equal to 0, the dfgmres_getroutine updates the solution to the vector xaccording to the computations done by the

ipar(13)

dfgmres routine. If the value is positive, theroutine writes the solution to the right handside vector b. If the value is negative, theroutine returns only the number of the currentiteration, and does not update the solution.The default value is 0

NOTE. Advanced users may use thedfgmres_get routine at any place in thecode. In this case special attentionshould be paid to the parameteripar(13). The RCI FGMRES iterationscan be continued after the call todfgmres_get routine only if theparameter ipar(13) is not equal to zero.If ipar(13) is positive, then the updatedsolution will overwrite the right hand sidein the vector b. If the user wants to runthe restarted version of FGMRES with thesame right hand side, it should be savedin a different memory location before thefirst call to dfgmres_get routine withpositive ipar(13).

2166


- contains the internal iteration counter thatcounts the number of iterations before therestart takes place. The initial value is 0.

ipar(14)


- specifies the length of the non-restartedFGMRES iterations. To run the restartedversion of the FGMRES method, the user

ipar(15)

should assign to ipar(15) the number ofiterations before the restart takes place. Thedefault value is min {150, n}, that is, bydefault the non-restarted version of FGMRESmethod is used.

- service variable, specifies the location of therotated Hessenberg matrix from which thematrix stored in the packed format (see“Matrix Arguments” in the Appendix B fordetails) is started in the tmp array.

ipar(16)

- service variable, specifies the location of therotation cosines from which the vector ofcosines is started in the tmp array.

ipar(17)

- service variable, specifies the location of therotation sines from which the vector of sinesis started in the tmp array.

ipar(18)

- service variable, specifies the location of therotated residual vector from which the vectoris started in the tmp array.

ipar(19)

- service variable, specifies the location of theleast squares solution vector from which thevector is started in the tmp array.

ipar(20)

- service variable, specifies the location of theset of preconditioned vectors from which theset is started in the tmp array. The memory

ipar(21)

2167


locations in the tmp array starting fromipar(21)are used only for preconditionedFGMRES method.

- specifies the memory location from whichthe first vector (source) used in operationsrequested via RCI_request is started in thetmp array.

ipar(22)

- specifies the memory location from whichthe second vector (source) used in operationsrequested via RCI_request is started in thetmp array.

ipar(23)

are reserved and not used in the current RCIFGMRES routines.

ipar(24:128)

NOTE. Advanced users can define thearray in the code as follows: INTEGERipar(23). However, to guarantee thecompatibility with the future releases ofthe Intel MKL it is highly recommendedto declare the array ipar of length 128.

- DOUBLE PRECISION array, this parameter is used to specify thedouble precision set of data for the RCI CG computations,specifically:

dpar(128)

- specifies the relative tolerance, the defaultvalue is 1.0D-6;

dpar(1)

- specifies the absolute tolerance, the defaultvalue is 0.0D-0;

dpar(2)

- pecifies the Euclidean norm of the initialresidual (if it is computed in the dfgmresroutine), the initial value is 0.0;

dpar(3)

- service variable, it is equal todpar(1)*dpar(3)+dpar(2) (if it is computedin the dfgmres routine), the initial value is0.0;

dpar(4)

2168


- specifies the Euclidean norm of the currentresidual, the initial value is 0.0;

dpar(5)

- specifies the Euclidean norm of residual fromthe previous iteration step (if available), theinitial value is 0.0;

dpar(6)

- contains the norm of the generated vector,the initial value is 0.0;

dpar(7)

NOTE. For reference only: in terms of[Saad03] this parameter is the coefficienthk+1,k of the Hessenberg matrix.

- contains the tolerance for the "zero" normof the currently generated vector, the defaultvalue is 1.0D-12.

dpar(8)

are reserved and not used in the current RCIFGMRES routines.

dpar(9:128)

NOTE. Advanced users can define thisarray in the code as follows: DOUBLEPRECISION dpar(8). However, toguarantee the compatibility with thefuture releases of the Intel MKL it ishighly recommended to declare the arraydpar of length 128.

-DOUBLE PRECISION array of size ((2*ipar(15)+1)*n +ipar(15)*(ipar(15)+9)/2 + 1)), this parameter is used tosupply the double precision temporary space for the RCI FGMREScomputations, specifically:

tmp

- contains the sequence of generated byFGMRES method vectors. The initial value is0.0 for the first part of this memory of lengthn;

tmp(1:ipar(16)-1)

2169


contains the rotated Hessenberg matrixgenerated by FGMRES method stored inthe packed format. There is no initialvalue for this part of tmp array;

tmp(ipar(16):ipar(17)-1)

- contains the rotation cosines vectorgenerated by FGMRES method. There is noinitial value for this part of tmp array;


contains the rotation sines vector generatedby FGMRES method. There is no initial valuefor this part of tmp array;


contains the rotated residual vector generatedby FGMRES method. There is no initial valuefor this part of tmp array;


contains the solution vector to the leastsquares problem generated by FGMRESmethod. There is no initial value for this partof tmp array;


- - contains the set of preconditioned vectorsgenerated for FGMRES method by the user.This part of tmp array is not used if

tmp(ipar(21):)

non-preconditioned version of FGMRES methodis called. There is no initial value for this partof tmp array.

NOTE. Advanced users can define thisarray in the code as DOUBLE PRECISIONtmp((2*ipar(15)+1)*n +ipar(15)*(ipar(15)+9)/2 + 1)) ifthey run only non-preconditionedFGMRES iterations.

2170


dcg_initInitializes the solver.

Syntax

dcg_init(n, x, b, RCI_request, ipar, dpar, tmp)

Input Parameters

INTEGER. Contains the size of the problem, and size ofarrays x and b.

n

DOUBLE PRECISION array of size n. Contains the initialapproximation to the solution vector. Normally it is equalto 0 or to b.

x

DOUBLE PRECISION array of size n. Contains the right-handside vector.

b

Output Parameters

INTEGER. Informs about the task completion.RCI_request

INTEGER array of size 128. Refer to the CG CommonParameters.

ipar

DOUBLE PRECISION array of size 128. Refer to the CGCommon Parameters.

dpar

DOUBLE PRECISION array of size (n,4). Refer to the CGCommon Parameters.

tmp

Description

The routine dcg_init is called to initialize the solver. After initialization all subsequentinvocations of Intel MKL RCI CG routines can use the values of all parameters that are returnedby dcg_init. Advanced users can skip this step and set the values to these parameters directlyin the corresponding routines.

WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dcg_check routine, but itdoes not guarantee that the method will work correctly.

2171


Return Values

The routine completed task normally.RCI_request= 0

The routine failed to complete the task.RCI_request= -10000

dcg_checkChecks the consistency and correctness of the userdefined data.

Syntax

dcg_check(n, x, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n


x


b

Output Parameters


INTEGERarray of size 128. Refer to the CG CommonParameters.

ipar


dpar


tmp

2172


Description

The routine dcg_check checks the consistency and correctness of the parameters to be passedto the solver routine dcg. However this operation does not guarantee that the method will beable to produce the correct result. It only reduces the chance to make a mistake in theparameters of the method. Advanced users can skip it if they are sure that the correct data isspecified in the solver parameters.

WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dcg_check routine, but itdoes not guarantee that the method will work correctly.

Note that the lengths of all vectors are assumed to have been defined in a previous call todcg_init subroutine.

Return Values


The routine is interrupted, errors occur.RCI_request= -1100

The routine returns some warning messages.RCI_request= -1001

The routine changed some parameters to make themconsistent or correct.

RCI_request= -1010

The routine returns some warning messages andchanged some parameters.

RCI_request= -1011

dcgComputes the approximate solution vector.

Syntax

dcg(n, x, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n

2173


DOUBLE PRECISION array of size n. Contains the initialapproximation to the solution vector.

x


b


tmp

Output Parameters

INTEGER. Informs about the task completion status.RCI_request

DOUBLE PRECISION array of size n. Contains the updatedapproximation to the solution vector.

x


ipar


dpar


tmp

Description

The routine dcg computes the approximate solution vector using the CG method [Young71].The value that was in the vector x before the first call, the routine dcg uses as an initialapproximation to the solution. The parameter RCI_request inform the user about taskcompletion status and ask for results of certain operations that are required to the solver.

Note that the lengths of all vectors are assumed to have been defined in a previous call to thedcg_init routine.

Return Values

The routine completed task normally, the solutionis found and stored in the vector x. This occurs onlyif the stopping tests are fully automatic. For the userdefined stopping tests, see the comments to theRCI_request= 2.

RCI_request=0

2174


The routine is interrupted because the maximalnumber of iterations is reached, but the relativestopping criterion is not satisfied (this occurs only ifboth tests are requested by the user).

RCI_request=-1

The routine is interrupted because the attempt todivide by zero occurs. This happens if the matrix is(almost) non-positive definite.

RCI_request=-2

The routine is interrupted because the residual normis invalid. (Probably, the data in dpar(6) werealtered outside of the routine, or the dcg_checkroutine was not called).

RCI_request=- 10

The routine is interrupted because it enters theinfinite cycle. (Probably, the data in ipar(8),ipar(9), ipar(10) were altered outside of theroutine, or the dcg_check routine was not called).

RCI_request=-11

Asks user to multiply the matrix by tmp(1:n,1),put the result in the tmp(1:n,2), and return thecontrol back to the routine dcg.

RCI_request= 1

Asks user to perform the stopping test(s). If theyfail, the user should return the control back to thedcg routine. Otherwise, the solution is found andstored in the vector x.

RCI_request= 2

Asks user to apply the preconditioner to tmp(:,3),put the result in the tmp(:,4), and return thecontrol back to the routine dcg.

RCI_request= 3

dcg_getRetrieves the number of the current iteration.

Syntax

dcg_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)

Input Parameters


n

2175


DOUBLE PRECISION array of size n. Contains the initialapproximation vector to the solution.

x


b

INTEGER. This parameter is not used.RCI_request


ipar


dpar


tmp

Output Parameters

INTEGER argument. Contains the value of the currentiteration number.

itercount

Description

The routine dcg_get is called to retrieve the current iteration number of the solutions process.

Return Values

The routine dcg_get does not return any value.

dcgmrhs_initInitializes the RCI CG solver with MHRS.

Syntax

dcgmrhs_init(n, x, nrhs, b, method, RCI_request, ipar, dpar, tmp)

Input Parameters


n

DOUBLE PRECISION matrix of size n by nrhs. Contains theinitial approximation to the solution vectors. Normally it isequal to 0 or to b.

x

2176


INTEGER. This parameter sets the number of right-handsides.

nrhs

DOUBLE PRECISION matrix of size (nrhs,n). Contains theright-hand side vectors.

b

INTEGER. Specifies the method of solution:method1 - CG with multiple right hand sides; default value.2 - Block-CG (not supported now)

Output Parameters


INTEGER array of size (128+2*nrhs). Refer to the CGCommon Parameters.

ipar

DOUBLE PRECISION array of size (128+2*nrhs). Refer tothe CG Common Parameters.

dpar

DOUBLE PRECISION array of size (n, 3+nrhs). Refer tothe CG Common Parameters.

tmp

Description

The routine dcgmrhs_init is called to initialize the solver. After initialization all subsequentinvocations of Intel MKL RCI CG with multiple right hand sides (MRHS) routines can use thevalues of all parameters that are returned by dcgmrhs_init. Advanced users can skip thisstep and set the values to these parameters directly in the corresponding routines.

WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dcgmrhs_check routine, butit does not guarantee that the method will work correctly.

NOTE. To use this routine with the name dcg_init, the user must switch on thecompiler's preprocessor and include the files mkl_solver.h for C/C++, or mkl_solver.f77for FORTRAN.

Return Values


2177



dcgmrhs_checkChecks the consistency and correctness of the userdefined data.

Syntax

dcgmrhs_check(n, x, nrhs, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n

DOUBLE PRECISION matrix of size n by nrhs. Contains theinitial approximation to the solution vectors. Normally it isequal to 0 or to b.

x


nrhs


b

Output Parameters



ipar


dpar


tmp

Description

The routine dcgmrhs_check checks the consistency and correctness of the parameters to bepassed to the solver routine dcgmrhs. However this operation does not guarantee that themethod will be able to produce the correct result. It only reduces the chance to make a mistakein the parameters of the method. Advanced users can skip it if they are sure that the correctdata is specified in the solver parameters.

2178


WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dcgmrhs_check routine, butit does not guarantee that the method will work correctly.

Note that the lengths of all vectors are assumed to have been defined in a previous call todcgmrhs_init subroutine.

NOTE. To use this routine with the name dcg_check, the user must switch on thecompiler's preprocessor and include the files mkl_solver.h for C/C++, or mkl_solver.f77for FORTRAN.

Return Values





RCI_request= -1010


RCI_request= -1011

dcgmrhsComputes the approximate solution vectors.

Syntax

dcgmrhs(n, x, nrhs, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n

DOUBLE PRECISION matrix of size n by nrhs. Contains theinitial approximation to the solution vectors.

x

2179



nrhs


b


tmp

Output Parameters


DOUBLE PRECISION matrix of size n by nrhs. Contains theupdated approximation to the solution vectors.

x


ipar


dpar


tmp

Description

The routine dcgmrhs computes the approximate solution vectors using the CG with multipleright hand sides (MRHS) method [Young71]. The value that was in the x before the first call,the routine dcgmrhs uses as an initial approximation to the solution. The parameterRCI_request inform the user about task completion status and ask for results of certainoperations that are required to the solver.

Note that the lengths of all vectors are assumed to have been defined in a previous call to thedcgmrhs_init routine.

NOTE. To use this routine with the name dcg, the user must switch on the compiler'spreprocessor and include the files mkl_solver.h for C/C++, or mkl_solver.f77 for FORTRAN.

2180


Return Values

The routine completed task normally, the solutionis found and stored in the matrixr x. This occurs onlyif the stopping tests are fully automatic. For the userdefined stopping tests, see the comments to theRCI_request= 2.

RCI_request=0


RCI_request=-1

The routine is interrupted because the attempt todivide by zero occurs. This happens if the matrix is(almost) non-positive definite.

RCI_request=-2

The routine is interrupted because the residual normis invalid. (Probably, the data in dpar(6) werealtered outside of the routine, or the dcgmrhs_checkroutine was not called).

RCI_request=- 10

The routine is interrupted because it enters theinfinite cycle. (Probably, the data in ipar(8),ipar(9), ipar(10) were altered outside of theroutine, or the dcgmrhs_check routine was notcalled).

RCI_request=-11

Asks user to multiply the matrix by tmp(1:n,1),put the result in the tmp(1:n,2), and return thecontrol back to the routine dcgmrhs.

RCI_request= 1

Asks user to perform the stopping test(s). If theyfail, the user should return the control back to thedcgmrhs routine. Otherwise, the solution is foundand stored in the matrix x.

RCI_request= 2

Asks user to apply the preconditioner to tmp(:,3),put the result in the tmp(:,4), and return thecontrol back to the routine dcgmrhs.

RCI_request= 3

2181


dcgmrhs_getRetrieves the number of the current iteration.

Syntax

dcgmrhs_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)

Input Parameters


n

DOUBLE PRECISION matrix of size n by nrhs. Contains theinitial approximation to the solution vectors.

x


nrhs

DOUBLE PRECISION matrix of size (nrhs,n). Contains theright-hand side .

b

INTEGER. This parameter is not used.RCI_request


ipar


dpar


tmp

Output Parameters


itercount

Description

The routine dcgmrhs_get is called to retrieve the current iteration number of the solutionsprocess.

NOTE. To use this routine with the name dcg_get, the user must switch on the compiler'spreprocessor and include the files mkl_solver.h for C/C++, or mkl_solver.f77 for FORTRAN.

2182


Return Values

The routine dcgmrhs_get does not return any value.

dfgmres_initInitializes the solver.

Syntax

dfgmres_init(n, x, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n


x


b

Output Parameters


INTEGER array of size 128. Refer to the FGMRES CommonParameters.

ipar

DOUBLE PRECISION array of size 128. Refer to the FGMRESCommon Parameters.

dpar

DOUBLE PRECISION array of size((2*ipar(15)+1)*n+ipar(15)*ipar(15)+9)/2 + 1).Refer to the FGMRES Common Parameters.

tmp

Description

The routine dfgmres_init is called to initialize the solver. After initialization all subsequentinvocations of Intel MKL RCI FGMRES routines can use the values of all parameters that arereturned by dfgmres_init. Advanced users can skip this step and set the values to theseparameters directly in the corresponding routines.

2183


WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dfgmres_check routine, butit does not guarantee that the method will work correctly.

Return Values



dfgmres_checkChecks the consistency and correctness of the userdefined data.

Syntax

dfgmres_check(n, x, b, RCI_request, ipar, dpar, tmp)

Input Parameters


n


x


b

Output Parameters



ipar


dpar

2184



tmp

Description

The routine dfgmres_check checks the consistency and correctness of the parameters to bepassed to the solver routine dfgmres. However this operation does not guarantee that themethod will be able to produce the correct result. It only reduces the chance to make a mistakein the parameters of the method. Advanced users can skip it if they are sure that the correctdata is specified in the solver parameters.

WARNING. Users can modify the contents of these arrays after they are passed to thesolver routine only if they are sure that the values are correct and consistent. Basic checkfor correctness and consistency can be done by calling the dfgmres_check routine, butit does not guarantee that the method will work correctly.

Note that the lengths of all vectors are assumed to have been defined in a previous call todfgmres_init subroutine.

Return Values





RCI_request= -1010


RCI_request= -1011

dfgmresMakes the FGMRES iterations.

Syntax

dfgmres(n, x, b, RCI_request, ipar, dpar, tmp)

2185


Input Parameters


n

DOUBLE PRECISION array of size n. Contains the initialapproximation to the solution vector.

x


b


tmp

Output Parameters



ipar


dpar


tmp

Description

The routine dfgmres performs the FGMRES iterations [Saad03]. The value that was in thevector x before the first call, the routine dfgmres uses as an initial approximation to the solution.To update the current approximation to the solution, the user should call dfgmres_get routine.The RCI FGMRES iterations can be continued after the call to the dfgmres_get routine only ifthe value of the parameter ipar(13) is not equal to 0 (default value). Note that the updatedsolution will overwrite the right hand side in the vector b if the parameter ipar(13) is positive,and it will be impossible to run the restarted version of FGMRES method. If the user wants tokeep the right hand side, it should be saved in a different memory location before the first callto dfgmres_get routine with positive ipar(13).

The parameter RCI_request inform the user about task completion status and ask for resultsof certain operations that are required to the solver.

Note that the lengths of all vectors are assumed to have been defined in a previous call to thedfgmres_init routine.

2186


Return Values

The routine completed task normally, the solutionis found. This occurs only if the stopping tests arefully automatic. For the user defined stopping tests,see the comments to the RCI_request= 2 or 4.

RCI_request= 0


RCI_request= -1

The routine is interrupted because the attempt todivide by zero occurs. Normally it happens if thematrix is (almost) degenerate. However it may

RCI_request= -10

happen if the parameter dpar is altered by mistale,or if the method is not stopped when the solution isfound.

The routine is interrupted because it enters theinfinite cycle. (Probably, the data in ipar(8),ipar(9), ipar(10) were altered outside of theroutine, or the dfgmres_check routine was notcalled).

RCI_request= -11

The routine is interrupted because the some errorsare found in the method parameters. Normally thishappens if the parameters ipar and dpar arealtered by mistake outside the routine.

RCI_request= -12

Asks user to multiply the matrix by tmp(ipar(22)),put the result in the tmp(ipar(23)), and return thecontrol back to the routine dfgmres.

RCI_request= 1

Asks user to perform the stopping test(s). If theyfail, the user should return the control back to thedfgmres routine. Otherwise, the FGMRES solution

RCI_request= 2

is found, and the user should call the dfgmres_getroutine and update the computed solution to the thevector x.

Asks user to apply the inverse preconditioner toipar(22), put the result in the ipar(23), andreturn the control back to the routine dfgmres.

RCI_request= 3

2187


Asks user to check the norm of the currentlygenerated vector. If it is not zero up tocomputational/rounding errors, the user should

RCI_request= 4

return the control back to the dfgmres routine.Otherwise, the FGMRES solution is found, and theuser should call the dfgmres_get routine andupdate the computed solution to the the vector x.

dfgmres_getRetrieves the number of the current iteration andupdates the solution.

Syntax

dfgmres_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)

Input Parameters


n


ipar


dpar


tmp

Output Parameters

DOUBLE PRECISION array of size n. If ipar(13)= 0, itcontains the updated approximation to the solution accordingto the computations done in dfgmres routine. Otherwise,it is not changed.

x

DOUBLE PRECISION array of size n. If ipar(13)> 0, itcontains the updated approximation to the solution accordingto the computations done in dfgmres routine. Otherwise,it is not changed.

b

2188




itercount

Description

The routine dfgmres_get is called to retrieve the current iteration number of the solutionsprocess and to update the solution according to the computations performed by the dfgmresroutine. To retrieve the current iteration number only, set the parameter ipar(13)= -1beforehand. This is normally recommended to do to proceed further with the computations. Ifthe intermediate solution is needed, the method parameters should be set properly, see fordetails FGMRES Common Parameters and Iterative Sparse Solver Code Examples section inthe Appendix C.

Return Values


The routine is interrupted because the some errorsare found in the method parameters. Normally thishappens if some of the parameters ipar and dparare altered by mistake outside the routine.

RCI_request= -12



Several aspects of the Intel MKL RCI ISS interface are platform-specific and language-specific.In order to promote portability across platforms and ease of use across different languages,users are encouraged to include one of the Intel MKL RCI ISS language-specific header files.Currently, there is one language specific header file for C programs.

These language-specific header file defines function prototypes and they are the following:

void dcg_init(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp);

void dcg_check(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp);

void dcg(int *n, double *x, double *b, int *rci_request, int *ipar, doubledpar, double *tmp);

void dcg_get(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp, int *itercount);

2189


void dcgmrhs_init(int *n, double *x, int *nRhs, double *b, int *method, int*rci_request, int *ipar, double dpar, double *tmp);

void dcgmrhs_check(int *n, double *x, int *nRhs, double *b, int *rci_request,int *ipar, double dpar, double *tmp);

void dcgmrhs(int *n, double *x, int *nRhs, double *b, int *rci_request, int*ipar, double dpar, double *tmp);

void dcgmrhs_get(int *n, double *x, int *nRhs, double *b, int *rci_request,int *ipar, double dpar, double *tmp, int *itercount);

void dfgmres_init(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp);

void dfgmres_check(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp);

void dfgmres(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp);

void dfgmres_get(int *n, double *x, double *b, int *rci_request, int *ipar,double dpar, double *tmp, int *itercount);

NOTE. Use of the Intel MKL RCI ISS software without including the language specificheader file is not supported.

Preconditioners or Accelerators based on Incomplete LUFactorization Technique

Usually, preconditioners or accelerators are used to accelerate an iterative solution process. Insome cases, their use can reduce dramatically the number of iterations and thus lead to bettersolver performance. Although the terms ‘preconditioner’ and ‘accelerator’ are synonyms,hereafter only term ‘preconditioner’ will be used.

Currently, Intel MKL provides one preconditioner for PARDISO CSR matrix format (see SparseMatrix Storage Format section). Full MKL compressed sparse row (CSR) format is not supported.The preconditioner is ILU0 preconditioner. It is based on a well-known factorization of theoriginal matrix into a product of two triangular matrices: low triangular and upper triangularmatrices. Usually, such a decomposition leads to some fill-in in the resulting matrix structureas compared to the original matrix. The distinctive feature of the ILU0 preconditioner is that itpreserves the structure of the original matrix in the result.

2190


The used algorithm is described in [Saad03]. The ILU0 preconditioner can be applied to anynon-degenerate matrix. It can be used alone or together with the MKL RCI FGMRES solver (seeSparse Solver Routines). It is not recommended to use the preconditioner with MKL RCI CGsolver because in general, it gives non-symmetric resulting matrix even if the original matrixis a symmetric one. Usually, an inverse of the preconditioner is required when it is used. Forthis purpose, for the ILU0 preconditioner a user should apply the Intel MKL triangular solverroutine mkl_dcsrtrsv twice: for the low triangular part of the preconditioner, and then for itsupper triangular part.

NOTE. Although ILU0 preconditioner can be applied to any non-degenerate matrix, insome cases the algorithm does not guarantee that it will terminate successfully andproduce the required result. The result of ILU0 routine can be checked in practice only.

Preconditioner may increase the number of iterations for an arbitrary case of the systemand initial guess and even ruin the convergence. It is user's responsibility to carefullyuse a suitable preconditioner.

General Scheme of Using ILU0 and RCI FGMRES Routines

The following pseudo code shows the general scheme of using the ILU0 preconditioner in theRCI FGMRES context.

...

generate matrix A


call dfgmres_init(n, x, b, RCI_request, ipar, dpar, tmp)


call dcsrilu0(n, a, ia, ja, bilu0, ipar, dpar, ierr)

call dfgmres_check(n, x, b, RCI_request, ipar, dpar, tmp)

1 call dfgmres(n, x, b, RCI_request, ipar, dpar, tmp)


multiply the matrix A by tmp(ipar(22)) and put the result in tmp(ipar(23))


goto 1

endif

2191






go to 1

else

c stop FGMRES iterations.

goto 2

endif

endif


c Below, trvec is an intermediate vector of length at least n

c Here is the recommended use of the result produced by the ILU0 routine.

c via standard Intel MKL Sparse Blas solver routine mkl_dcsrtrsv.

call mkl_dcsrtrsv('L','N','U',n,bilu0,ia,ja,tmp(ipar(22)),trvec)

call mkl_dcsrtrsv('U','N','N',n,bilu0,ia,ja,trvec,tmp(ipar(23)))


goto 1

endif


check the norm of the next orthogonal vector, it is contained in dpar(7)

if (the norm is not zero up to rounding/computational errors) then


goto 1

else


goto 2

2192


endif

endif

2 call dfgmres_get(n, x, b, RCI_request, ipar, dpar, tmp, itercount)



You can find example codes that use RCI ISS interface routines to solve systems of linearequations in the Iterative Sparse Solver Code Example section in the Appendix C.

ILU0 Preconditioner Interface Description

The terms and concepts required to understand the use of the Intel MKL preconritioner routineare discussed in the Linear Solvers Basics. In this manual, we use FORTRAN style notations.



User Data Arrays

The ILU0 routine takes arrays of user data as input, for example, user arrays representing theoriginal matrix. In order to minimize storage requirements and improve overall run-timeefficiency, the Intel MKL ILU0 routine does not make copies of the user input arrays.

ILU0 Parameters

The preconditioner has parameters for passing various options to the routine. The values forthese parameters should be specified very carefully, and they can be changed duringcomputations according to the user's needs.

Some parameters are common with the FGMRES Common Parameters. Their default and initialvalues are specified by the routine dfgmres_init only. However, user can redefine theseparameters by his own values. These parameters are listed below.

ipar(2) - specifies the type of output for error messages that are generated by the ILU0routine. The default value 6 means that all messages are displayed on the screen. Otherwisethe error messages are written to the newly created file MKL_PREC_log.txt. Note that if theparameter ipar(6) is set to 0, error messages are not generated at all.

2193


ipar(6) - if its value is not equal to 0, the routine returns error messages in accordance withthe parameter ipar(2). Otherwise, the routine does not generate error messages at all, butreturns a negative value of the parameter ierr. The default value is 1.

NOTE. Users must provide correct and consistent parameters to the routine to avoidfails or wrong results.

dcsrilu0ILU0 preconditioner based on incomplete LUfactorization of a sparse matrix in the CSR format(PARDISO variation).

Syntax

Fortran:

call dcsrilu0(n, a, ia, ja, bilu0, ipar, dpar, ierr)

C:

dcsrilu0(&n, a, ia, ja, bilu0, ipar, dpar, &ierr);

Input Parameters

INTEGER. Size (number of rows or columns) of the originalsquare n-by-n-matrix A.

n

DOUBLE PRECISION. Array containing the set of elementsof the matrix A. Its length is equal to the number of non-zeroelements in the matrix A. Refer to values array descriptionin the Sparse Matrix Storage Format section for more details.

a

INTEGER. Array of size (n+1) containing begin indices ofrows of matrix A such that ia(I) is the index in the arrayA of the first non-zero element from the row I. The value

ia

of the last element ia(n+1) is equal to the number ofnon-zeros in the matrix A plus one. Refer to the rowIndexarray description in the Sparse Matrix Storage Formatsection for more details.

2194


INTEGER. Array containing the column indices for eachnon-zero element of the matrix A. Its size is equal to thesize of the array a. Refer to the columns array descriptionin the Sparse Matrix Storage Format section for more details.

ja

NOTE. Column indices should be put in increasingorder for each row of matrix.

INTEGER array of size 128. This parameter is used to specifythe integer set of data for both the ILU0 and RCI FGMREScomputations. Refer to the ipar array description in the

ipar

FGMRES Common Parameters for more details on FGMRESparameter entries. The entries that are specific to ILU0 arelisted below.

- specifies how the routine operates ifzero diagonal element occurs duringcalculation. If this parameter is set to 0

ipar(31)

(default value set by the routinedfgmres_init) then that the calculationsare stopped and the routine returnsnon-zero error value. Otherwise the valueof the diagonal element is set to thespecified value and the calculations arecontinued.

NOTE. Advanced users can definethis array in the code as follows:INTEGER ipar(31). However, toguarantee the compatibility with thefuture releases of the Intel MKL it ishighly recommended to declare thearray ipar of length 128.

2195


DOUBLE PRECISION array of size 128. This parameter isused to specify the double precision set of data for both theILU0 and RCI FGMRES computations. Refer to the dpar

dpar

array description in the FGMRES Common Parameters formore details on FGMRES parameter entries. The entries thatare specific to ILU0 are listed below.

- specifies the small value that iscompared with the diagonal elementsduring calculations; if the value of the

dpar(31)

diagonal element is smaller, then it is setto dpar(32), or the calculations arestopped, in accordance with ipar(31);the default value is 1.0D-16

NOTE. This parameter can be setto the negative value, because itsabsolute value is actually used incalculations.

If this parameter is set to 0, thecomparison with the diagonalelement is not performed.

- specifies the value that is assigned tothe diagonal element if its value is lessthan dpar(31) (see above); the defaultvalue is 1.0D-10

dpar(32)

NOTE. Advanced users can definethis array in the code as follows:DOUBLE PRECOSIONdpar(32).However, to guarantee thecompatibility with the future releasesof the Intel MKL it is highlyrecommended to declare the arraydpar of length 128.

2196


Output Parameters

DOUBLE PRECISION. Array B stored in the PARDISO CSRformat containing non-zero elements of the resultingpreconditioning matrix. Its size is equal to the number of

bilu0

non-zero elements in the matrix A. Refer to the valuesarray description in the Sparse Matrix Storage Formatsection for more details.

INTEGER. Error flag, informs about the routine completionstatus.

ierr

NOTE. To present the resulting preconditioning matrix in the CSR format the arrays ia(row indices) and ja (column indices) of the input matrix must be used.

Description

The routine dcsrilu0 computes a preconditioner B [Saad03] of a given sparse matrix A storedin the CSR format accepted in PARDISO:

A~B=L*U , where L is a low triangular matrix with unit diagonal, U is an upper triangular matrixwith non-unit diagonal, and the portrait of the original matrix A is used to store the incompletefactors L and U.

Return Values

The routine completed task normally.ierr=0

The routine is interrupted, the error occurs: at leastone diagonal element is omitted from the matrix inCSR format (see Sparse Matrix Storage Format).

ierr=-101

The routine is interrupted because the matrixcontains zero diagonal element, routine can notperform operations.

ierr=-102

The routine is interrupted as the matrix contains toosmall diagonal element, and an overflow may occurbecause of the division by its value required tocomplete the task, or a bad approximation to ILU0with use of this element will be computed.

ierr=-103

2197


The routine is interrupted because the memory isinsufficient for the internal work array.

ierr=-104

The routine is interrupted because the input matrixsize n isless than or equal to 0.

ierr=-105

The routine is interrupted because the the columnindices ja are placed in not increasing order.

ierr=-106

Interfaces

Fortran 77 and Fortran 95:SUBROUTINE dcsrilu0 (n, a, ia, ja, bilu0, ipar, dpar, ierr)

INTEGER n, ierr, ipar(128)


DOUBLE PRECISION a(*), bilu0(*), dpar(128)

C:void dcsrilu0 (int *n, double *a, int *ia, int *ja, double *bilu0, int*ipar, double *dpar, int *ierr);

Calling Sparse Solver Routines From C/C++The calling interface for all of the Intel MKL sparse solver routines is designed to be used easilyfrom Fortran 77 or Fortran 90. However, any of these routines can be invoked directly from Cor C++ if users are familiar with the inter-language calling conventions of their platforms. Theseconventions include, but are not limited to, the argument passing mechanisms for the language,the data type mappings from Fortran to C/C++ and how Fortran external names are decoratedon the platform.

In order to promote portability and to avoid having most users deal with these issues, the Cheader files provide a set of macros and type definitions that are intended to hide theinter-language calling conventions and provide an interface to the Intel MKL sparse solverroutines that appears natural for C/C++.

2198


For example, consider a hypothetical library routine, foo, that takes real vector of length n,and returns an integer status. Fortran users would access such a function as:INTEGER n, status, foo

REAL x(*)

status = foo(x, n)

As noted above, for C users to invoke foo, they would need to know what C data typescorrespond to Fortran types INTEGER and REAL; what argument passing mechanism the Fortrancompiler uses; and what, if any, name decoration the is performed by the Fortran compilerwhen generating the external symbol foo.

However, by using the C specific header file, for example mkl_solver.h, the invocation of foo,within a C program would look like:#include "mkl_solver.h"

_INTEGER_t i, status;

_REAL_t x[];

status = foo( x, i );

Note that in the above example, the header file mkl_solver.h provides definitions for the types_INTEGER_t and _REAL_t that correspond to the Fortran types INTEGER and REAL.

In order to ease the use of Intel MKL sparse solver routines from C and C++, the generalapproach of providing C definitions of Fortran types is used throughout the library. Specifically,if an argument or result from a sparse solver is documented as having the Fortran languagespecific type XXX, then the C and C++ header files provide an appropriate C language typedefinitions for the name _XXX_t.

Caveat for C Users

One of the key differences between C/C++ and Fortran is the argument passing mechanismsfor the languages: Fortran programs use pass-by-reference semantics and C/C++ programsuse pass-by-value semantics. In the example in the previous section, the header file,mkl_solver.h, attempts to hide this difference, by defining a macro, foo that takes the addressof the appropriate arguments. For example, on Tru64 UNIX, mkl_solver.h would define themacro as:

#define foo(a,b) foo_((a), &(b))

2199


An important point to note when using the macro form of foo is how it deals with constants.If we write foo( x, 10 ), this is translated into foo_( x, &10 ). In a strictly ANSI compliantC compiler, it is not permissible to take the address of a constant, so a strictly conformingprogram would look like:

INTEGER_t iTen = 10;

_REAL_t * x;

status = foo( x, iTen );

However, some C compilers in a non-ANSI standard mode allow taking the address of a constantfor ease of use with Fortran programs. Thus, the form shown as foo( x, 10 ) is acceptablefor these compilers.

2200


9Vector Mathematical Functions

This chapter describes Vector Mathematical Functions Library (VML), which is designed to computemathematical functions on vector arguments. VML is an integral part of the Intel® MKL Kernel Library andthe VML terminology is used here for simplicity in discussing this group of functions.

VML includes a set of highly optimized implementations of certain computationally expensive coremathematical functions (power, trigonometric, exponential, hyperbolic etc.) that operate on vectors ofreal and complex numbers.

Application programs that might significantly improve performance with VML include nonlinear programmingsoftware, integrals computation, and many others.

VML functions are divided into the following groups according to the operations they perform:

• VML Mathematical Functions compute values of mathematical functions (such as sine, cosine,exponential, logarithm and so on) on vectors with unit increment indexing.

• VML Pack/Unpack Functions convert to and from vectors with positive increment indexing, vectorindexing and mask indexing (see Appendix B for details on vector indexing methods).

• VML Service Functions allow the user to set /get the accuracy mode, and set/get the error code.

VML mathematical functions take an input vector as argument, compute values of the respective functionelement-wise, and return the results in an output vector.

Data Types and Accuracy ModesMathematical and pack/unpack vector functions in VML have been implemented for vector argumentsof single and double precision real data. Both Fortran- and C-interfaces to all functions, includingVML service functions, are provided in the library. The differences in naming and calling the functionsfor Fortran- and C-interfaces are detailed in the Function Naming Conventions section below.

Each vector function from VML (for each data format) can work in two modes: High Accuracy (HA)and Low Accuracy (LA). For many functions, using the LA version will improve performance at thecost of accuracy.

For some cases, the advantage of relaxing the accuracy improves performance very little so thesame function is employed for both versions. Error behavior depends not only on whether the HAor LA version is chosen, but also depends on the processor on which the software runs.

In addition, special value behavior may differ between the HA and LA versions of the functions. Anyinformation on accuracy behavior can be found in the Intel MKL Release Notes.

2201

Switching between the two modes (HA and LA) is accomplished by using vmlSetMode(mode)(seeTable 9-12 ). The function vmlGetMode() will return the currently used mode. The HighAccuracy mode is used by default.

Function Naming ConventionsFull names of all VML functions include only lowercase letters for Fortran-interface, whereasfor C-interface names the lowercase letters are mixed with uppercase.

VML mathematical and pack/unpack function full names have the following structure:

v<name><mod>

The initial letter v is a prefix indicating that a function belongs to VML.

The field is a precision prefix that indicates the data type:

REAL for Fortran-interface, or float for C-interfaces

DOUBLE PRECISION for Fortran-interface, or double for C-interface.d

COMPLEX for Fortran-interface, or MKL_Complex8 for C-interface.c

DOUBLE COMPLEX for Fortran-interface, or MKL_Complex16 forC-interface.

z

The <name> field indicates the function short name, with some of its letters in uppercase forC-interface (see for example Table 9-2 or Table 9-11 ).

The <mod> field (written in uppercase for C-interface) is present in pack/unpack functions only;it indicates the indexing method used:

indexing with positive incrementi

indexing with index vectorv

indexing with mask vector.m

VML service function full names have the following structure:

vml<name>

where vml is a prefix indicating that a function belongs to VML, and <name> is the functionshort name, which includes some uppercase letters for C-ifglonterface (see Table 9-11 ). Tocall VML functions from an application program, use conventional function calls. For example,the VML exponential function for single precision real data can be called as

call vsexp (n, a, y ) for Fortran-interface, or

vsExp ( n, a, y); for C-interface.

2202

Functions Interface

The interface to VML functions includes function full names and the arguments list. The Fortran-and C-interface descriptions for different groups of VML functions are given below. Note thatsome functions (Div, Pow, and Atan2) have two input vectors a and b as their arguments,while SinCos function has two output vectors y and z.

VML Mathematical Functions

Fortran:

call v<name>( n, a, y )call v<name>( n, a, b, y )call v<name>( n, a, y, z )

C:

v<name>( n, a, y );v<name>( n, a, b, y );v<name>( n, a, y, z );

Pack Functions

Fortran:

call vpacki( n, a, inca, y )call vpackv( n, a, ia, y )call vpackm( n, a, ma, y )

C:

vPackI( n, a, inca, y );vPackV( n, a, ia, y );vPackM( n, a, ma, y );

Unpack Functions

Fortran:

call vunpacki( n, a, y, incy )

2203

Vector Mathematical Functions 9

call vunpackv( n, a, y, iy )call vunpackm( n, a, y, my )

C:

vUnpackI( n, a, y, incy );vUnpackV( n, a, y, iy );vUnpackM( n, a, y, my );

Service Functions

Fortran:

oldmode = vmlsetmode( mode )mode = vmlgetmode( )olderr = vmlseterrstatus ( err )err = vmlgeterrstatus( )olderr = vmlclearerrstatus( )oldcallback = vmlseterrorcallback( callback )callback = vmlgeterrorcallback( )oldcallback = vmlclearerrorcallback( )

C:

oldmode = vmlSetMode( mode );mode = vmlGetMode( void );olderr = vmlSetErrStatus ( err );err = vmlGetErrStatus( void );olderr = vmlClearErrStatus( void );oldcallback = vmlSetErrorCallBack( callback );callback = vmlGetErrorCallBack( void );oldcallback = vmlClearErrorCallBack( void );

Input Parameters

number of elements to be calculatedn

first input vectora

second input vectorb

2204

vector increment for the input vector ainca

index vector for the input vector aia

mask vector for the input vector ama

vector increment for the output vector yincy

index vector for the output vector yiy

mask vector for the output vector ymy

error codeerr

VML modemode

address of the callback functioncallback

Output Parameters

first output vectory

second output vectorz

error codeerr

VML modemode

former error codeolderr

former VML modeoldmode

address of the callback functioncallback

address of the former callback functionoldcallback

The data types of the parameters used in each function are specified in the respective functiondescription section. All VML mathematical functions can perform in-place operations, whichmeans that the same vector can be used as both input and output parameter. This holds truefor functions with two input vectors as well, in which case one of them may be overwritten withthe output vector. For functions with two output vectors, one of them may coincide with theinput vector. But partially overlapping input and output vectors could lead to unpredictableresults.

Vector Indexing MethodsCurrent VML mathematical functions work only with unit increment. Arrays with other increments,or more complicated indexing, can be accommodated by gathering the elements into a contiguousvector and then scattering them after the computation is complete.

Three following indexing methods are used to gather/scatter the vector elements in VML:

2205


• positive increment

• index vector

• mask vector.

The indexing method used in a particular function is indicated by the indexing modifier (seethe description of the <mod> field in Function Naming Conventions). For more information onindexing methods see Vector Arguments in VML in Appendix B.

Error DiagnosticsThe VML library has its own error handler. The only difference for C- and Fortran- interfaces isthat the Intel MKL error reporting routine XERBLA can be called after the Fortran- interface VMLfunction encounters an error, and this routine gets information on VML_STATUS_BADSIZE andVML_STATUS_BADMEM input errors (see Table 9-14 ).

The VML error handler has the following properties:

1. The Error Status (vmlErrStatus) global variable is set after each VML function call. Thepossible values of this variable are shown in the Table 9-14 .

2. Depending on the VML mode, the error handler function invokes:

• errno variable setting. The possible values are shown in the Table 9-1 .

• writing error text information to the stderr stream

• raising the appropriate exception on error, if necessary

• calling the additional error handler callback function.

Table 9-1 Set Values of the errno Variable

DescriptionValue of errno

No errors are detected.0The array dimension is not positive.EINVALNULL pointer is passed.EACCESAt least one of array values is out of a range of definition.EDOMAt least one of array values caused a singularity, overflow orunderflow.

ERANGE

2206

VML Mathematical FunctionsThis section describes VML functions which compute values of mathematical functions on realand complex vector arguments with unit increment.

Each function group is introduced by its short name, a brief description of its purpose, and thecalling sequence for each type of data both for Fortran- and C-interfaces, as well as a descriptionof the input/output arguments.

For all VML mathematical functions, the input range of parameters is equal to the mathematicalrange of definition in the set of defined values for the respective data type. Several VMLfunctions, specifically Div, Exp, Sinh, Cosh, and Pow, can result in an overflow. For thesefunctions, the respective input threshold values that mark off the precision overflow are specifiedin the function description section. Note that in these specifications, FLT_MAX denotes themaximum number representable in single precision real data type, while DBL_MAX denotes themaximum number representable in double precision real data type.

Table 9-2 lists available mathematical functions and data types associated with them.

Table 9-2 VML Mathematical Functions

DescriptionData TypesType of Distribution

Power and Root FunctionsInversion of the vector elementss, dInvDivide elements of one vector by elements of second vectors, d, c, zDivSquare root of vector elementss, dSqrtInverse square root of vector elementss, dInvSqrtCube root of vector elementss, dCbrtInverse cube root of vector elementss, dInvCbrtEach vector element raised to the specified powers, d, c, zPowEach vector element raised to the constant powers, d, c, zPowxSquare root of sum of squaress, dHypot

Exponential and Logarithmic FunctionsExponential of vector elementss, d, c, zExpNatural logarithm of vector elementss, d, c, zLnDenary logarithm of vector elementss, d, c, zLog10

Trigonometric FunctionsCosine of vector elementss, d, c, zCosSine of vector elementss, d, c, zSinSine and cosine of vector elementss, dSinCosTangent of vector elementss, d, c, zTan

2207


DescriptionData TypesType of Distribution

Inverse cosine of vector elementss, d, c, zAcosInverse sine of vector elementss, d, c, zAsinInverse tangent of vector elementss, d, c, zAtanFour-quadrant inverse tangent of elements of two vectorss, dAtan2

Hyperbolic FunctionsHyperbolic cosine of vector elementss, d, c, zCoshHyperbolic sine of vector elementss, d, c, zSinhHyperbolic tangent of vector elementss, d, c, zTanhInverse hyperbolic cosine (nonnegative) of vector elementss, d, c, zAcoshInverse hyperbolic sine of vector elementss, d, c, zAsinhComputes inverse hyperbolic tangent of vector elements.s, d, c, zAtanh

Special FunctionsError function value of vector elementss, dErfComplementary error function value of vector elementss, dErfcInverse error function value of vector elementss, dErfInv

Rounding FunctionsRounding towards minus infinitys, dFloorRounding towards plus infinitys, dCeilRounding towards zero infinitys, dTruncRounding to nearest integers, dRoundRounding according to current modes, dNearbyIntRounding according to current mode and raising inexact resultexception

s, dRint

Integer and fraction partss, dModf

InvPerforms element by element inversion of thevector.

Syntax

Fortran:

call vsinv( n, a, y )

call vdinv( n, a, y )

2208


C:

call vsInv( n, a, y );

call vdInv( n, a, y );

Input Parameters

DescriptionTypeName

Specifies the number of elements to be calculated.FORTRAN: INTEGER,INTENT(IN).

n

C: int.

FORTRAN: Array, specifies the input vector a.FORTRAN: REAL,INTENT(IN) for vsinv

a

C: Pointer to an array that contains the input vectora.DOUBLE PRECISION,

INTENT(IN) for vdinv

C: const float* for vsInv

const double* for vdInv

Output Parameters

DescriptionTypeName

FORTRAN: Array, specifies the outputvector y.

FORTRAN: REAL for vsinv

DOUBLE PRECISION for vdinv

y

C: Pointer to an array that contains theoutput vector y.

C: float* for vsInv

double* for vdInv

2209


DivPerforms element by element division of vector aby vector b

Syntax

Fortran:

call vsdiv( n, a, b, y )

call vddiv( n, a, b, y )

C:

vsDiv( n, a, b, y );

vdDiv( n, a, b, y );

Input Parameters

DescriptionTypeName

Specifies the number of elements to becalculated.

FORTRAN: INTEGER, INTENT(IN).

C: int.

n

FORTRAN: Arrays, specify the input vectorsa and b.

FORTRAN: REAL, INTENT(IN) forvsdiv

a, b

C: Pointers to arrays that contain the inputvectors a and b.

DOUBLE PRECISION, INTENT(IN)for vddiv

C: const float* for vsDiv

const double* for vdDiv

Table 9-3 Precision Overflow Thresholds for Div Function

Threshold Limitations on Input ParametersData Type

abs(a[i]) < abs(b[i]) * FLT_MAXsingle precision

abs(a[i]) < abs(b[i]) * DBL_MAXdouble precision

2210

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsdiv

DOUBLE PRECISION for vddiv

y


C: float* for vsDiv

double* for vdDiv

SqrtComputes a square root of vector elements.

Syntax

Fortran:

call vssqrt( n, a, y )

call vdsqrt( n, a, y )

call vcsqrt( n, a, y )

call vzsqrt( n, a, y )

C:

call vsSqrt( n, a, y );

call vdSqrt( n, a, y );

call vcSqrt( n, a, y );

call vzSqrt( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2211


DescriptionTypeName

FORTRAN: Array, specifies the input vectora.

FORTRAN: REAL, INTENT(IN) forvssqrt

a

C: Pointer to an array that contains the inputvector a.

DOUBLE PRECISION, INTENT(IN)for vdsqrt

COMPLEX, INTENT(IN) for vcsqrt

DOUBLE COMPLEX, INTENT(IN) forvzsqrt

C: const float* for vsSqrt

const double* for vdSqrt

const MKL_Complex8* for vcSqrt

const MKL_Complex16* forvzSqrt

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vssqrt

DOUBLE PRECISION for vdsqrt

y


COMPLEX for vcsqrt

DOUBLE COMPLEX for vzsqrt

C: float* for vsSqrt

double* for vdSqrt

MKL_Complex8* for vcSqrt

MKL_Complex16* for vzSqrt

2212


InvSqrtComputes an inverse square root of vectorelements.

Syntax

Fortran:

call vsinvsqrt( n, a, y )

call vdinvsqrt( n, a, y )

C:

call vsInvSqrt( n, a, y );

call vdInvSqrt( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsinvsqrt

a


DOUBLE PRECISION, INTENT(IN)for vdinvsqrt

C: const float* for vsInvSqrt

const double* for vdInvSqrt

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsinvsqrt

DOUBLE PRECISION for vdinvsqrt

y

2213


DescriptionTypeName


C: float* for vsInvSqrt

double* for vdInvSqrt

CbrtComputes a cube root of vector elements.

Syntax

Fortran:

call vscbrt( n, a, y )

call vdcbrt( n, a, y )

C:

call vsCbrt( n, a, y );

call vdCbrt( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvscbrt

a


DOUBLE PRECISION, INTENT(IN)for vdcbrt

C: const float* for vsCbrt

const double* for vdCbrt

2214


Output Parameters

DescriptionTypeName


FORTRAN: REAL for vscbrt

DOUBLE PRECISION for vdcbrt

y


C: float* for vsCbrt

double* for vdCbrt

InvCbrtComputes an inverse cube root of vector elements.

Syntax

Fortran:

call vsinvcbrt( n, a, y )

call vdinvcbrt( n, a, y )

C:

call vsInvCbrt( n, a, y );

call vdInvCbrt( n, a, y );

Input Parameters

DescriptionTypeName



int.

n


FORTRAN: REAL, INTENT(IN) forvsinvcbrt

a


DOUBLE PRECISION, INTENT(IN)for vdinvcbrt

C: const float* for vsInvCbrt

2215


DescriptionTypeName

const double* for vdInvCbrt

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsinvcbrt

DOUBLE PRECISION for vdinvcbrt

y


C: float* for vsInvCbrt

double* for vdInvCbrt

PowComputes a to the power b for elements of twovectors.

Syntax

Fortran:

call vspow( n, a, b, y )

call vdpow( n, a, b, y )

call vcpow( n, a, b, y )

call vzpow( n, a, b, y )

C:

vsPow( n, a, b, y );

vdPow( n, a, b, y );

vcPow( n, a, b, y );

vzPow( n, a, b, y );

2216


Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvspow

a, b


DOUBLE PRECISION, INTENT(IN)for vdpow

COMPLEX, INTENT(IN) for vcpow

DOUBLE COMPLEX, INTENT(IN) forvzpow

C: const float* for vsPow

const double* for vdPow

C: const MKL_Complex8* forvcPow

const MKL_Complex16* for vzPow

Table 9-4 Precision Overflow Thresholds for Pow Real Function


abs(a[i]) < ( FLT_MAX )1/b[i]single precision

abs(a[i]) < ( DBL_MAX )1/b[i]double precision

NOTE. Overflow can occur also in Pow complex function, but the exact formula is beyondthe scope of this document.

2217

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vspow

DOUBLE PRECISION for vdpow

y


COMPLEX for vcpow

DOUBLE COMPLEX for vzpow

C: float* for vsPow

double* for vdPow

MKL_Complex8* for vcPow

MKL_Complex16* for vzPow

Description

The real function Pow has certain limitations on the input range of a and b parameters.Specifically, if a[i] is positive, then b[i] may be arbitrary. For negative a[i], the value ofb[i] must be integer (either positive or negative).

The complex function Pow has no such input range limitations.

PowxRaises each element of a vector to the constantpower.

Syntax

Fortran:

call vspowx( n, a, b, y )

call vdpowx( n, a, b, y )

call vcpowx( n, a, b, y )

call vzpowx( n, a, b, y )

2218


C:

vsPowx( n, a, b, y );

vdPowx( n, a, b, y );

vcPowx( n, a, b, y );

vzPowx( n, a, b, y );

Input Parameters

DescriptionTypeName

Number of elements to be calculated.FORTRAN: INTEGER, INTENT(IN).

C: int.

n

FORTRAN: Array a that specifies the inputvector

FORTRAN: REAL, INTENT(IN) forvspowx

a


DOUBLE PRECISION, INTENT(IN)for vdpowx

COMPLEX, INTENT(IN) for vcpowx

DOUBLE COMPLEX, INTENT(IN) forvzpowx

C: const float* for vsPowx

const double* for vdPowx

const MKL_Complex8* for vcPowx

const MKL_Complex16* forvzPowx

FORTRAN: Scalar value b that is theconstant power.

FORTRAN: REAL, INTENT(IN) forvspowx

b

C: Constant value for power b.DOUBLE PRECISION, INTENT(IN)for vdpowx

COMPLEX, INTENT(IN) for vcpowx

DOUBLE COMPLEX, INTENT(IN) forvzpowx

2219


DescriptionTypeName

C: const float for vsPowx

const double for vdPowx

const MKL_Complex8* for vcPowx

const MKL_Complex16* forvzPowx

Table 9-5 Precision Overflow Thresholds for Powx Real Function


abs(a[i]) < ( FLT_MAX )1/bsingle precision

abs(a[i]) < ( DBL_MAX )1/bdouble precision

NOTE. Overflow can occur also in Powx complex function, but the exact formula isbeyond the scope of this document.

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vspowx

DOUBLE PRECISION for vdpowx

y


COMPLEX for vcpowx

DOUBLE COMPLEX for vzpowx

C: float* for vsPowx

double* for vdPowx

MKL_Complex8* for vcPowx

MKL_Complex16* for vzPowx

2220

Description

The real function Powx has certain limitations on the input range of a and b parameters.Specifically, if a[i] is positive, then b may be arbitrary. For negative a[i], the value of b mustbe integer (either positive or negative).

The complex function Powx has no such input range limitations.

HypotComputes a square root of sum of two squaredelements.

Syntax

Fortran:

call vshypot( n, a, b, y )

call vdhypot( n, a, b, y )

C:

vsHypot( n, a, b, y );

vdHypot( n, a, b, y );

Input Parameters

DescriptionTypeName

Number of elements to be calculated.FORTRAN: INTEGER, INTENT(IN).

C: int.

n

FORTRAN: Arrays that specify the inputvectors a and b

FORTRAN: REAL, INTENT(IN) forvshypot

a,b


DOUBLE PRECISION, INTENT(IN)for vdhypot

C: const float* for vsHypot

const double* for vdHypot

2221


Table 9-6 Precision Overflow Thresholds for Hypot Function


abs(a[i]) < sqrt(FLT_MAX)single precision

abs(b[i]) < sqrt(FLT_MAX)

abs(a[i]) < sqrt(DBL_MAX)double precision

abs(b[i]) < sqrt(DBL_MAX)

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vshypot

DOUBLE PRECISION for vdhypot

y


C: float* for vsHypot

double* for vdHypot

ExpComputes an exponential of vector elements.

Syntax

Fortran:

call vsexp( n, a, y )

call vdexp( n, a, y )

call vcexp( n, a, y )

call vzexp( n, a, y )

2222

C:

call vsExp( n, a, y );

call vdExp( n, a, y );

call vcExp( n, a, y );

call vzExp( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsexp

a


DOUBLE PRECISION, INTENT(IN)for vdexp

COMPLEX, INTENT(IN) for vcexp

DOUBLE COMPLEX, INTENT(IN) forvzexp

C: const float* for vsExp

const double* for vdExp

const MKL_Complex8* for vcExp

const MKL_Complex16* for vzExp

Table 9-7 Precision Overflow Thresholds for Exp Real Function


a[i] < Ln( FLT_MAX )single precision

a[i] < Ln( DBL_MAX )double precision

NOTE. Overflow can occur also in Exp complex function, but the exact formula is beyondthe scope of this document.

2223

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsexp

DOUBLE PRECISION for vdexp

y


COMPLEX for vcexp

DOUBLE COMPLEX for vzexp

C: float* for vsExp

double* for vdExp

MKL_Complex8* for vcExp

MKL_Complex16* for vzExp

LnComputes natural logarithm of vector elements.

Syntax

Fortran:

call vsln( n, a, y )

call vdln( n, a, y )

call vcln( n, a, y )

call vzln( n, a, y )

C:

vsLn( n, a, y );

vdLn( n, a, y );

vcLn( n, a, y );

vzLn( n, a, y );

2224


Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsln

a


DOUBLE PRECISION, INTENT(IN)for vdln

FORTRAN: COMPLEX, INTENT(IN)for vcln

DOUBLE COMPLEX, INTENT(IN) forvzln

C: const float* for vsLn

const double* for vdLn

const MKL_Complex8* for vcLn

const MKL_Complex16* for vzLn

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsln

DOUBLE PRECISION for vdln

y


COMPLEX for vcln

DOUBLE COMPLEX for vzln

C: float* for vsLn

double* for vdLn

MKL_Complex8* for vcLn

MKL_Complex16* for vzLn

2225


Log10Computes denary logarithm of vector elements.

Syntax

Fortran:

call vslog10( n, a, y )

call vdlog10( n, a, y )

call vclog10( n, a, y )

call vzlog10( n, a, y )

C:

call vsLog10( n, a, y );

call vdLog10( n, a, y );

call vcLog10( n, a, y );

call vzLog10( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvslog10

a


DOUBLE PRECISION, INTENT(IN)for vdlog10

COMPLEX, INTENT(IN) for vclog10

DOUBLE COMPLEX, INTENT(IN) forvzlog10

C: const float* for vsLog10

const double* for vdLog10

2226


DescriptionTypeName

const MKL_Complex8* forvcLog10

const MKL_Complex16* forvzLog10

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vslog10

DOUBLE PRECISION for vdlog10

y


COMPLEX for vclog10

DOUBLE COMPLEX for vzlog10

C: float* for vsLog10

double* for vdLog10

MKL_Complex8* for vcLog10

MKL_Complex16* for vzLog10

CosComputes cosine of vector elements.

Syntax

Fortran:

call vscos( n, a, y )

call vdcos( n, a, y )

call vccos( n, a, y )

call vzcos( n, a, y )

2227


C:

call vsCos( n, a, y );

call vdCos( n, a, y );

call vcCos( n, a, y );

call vzCos( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvscos

a


DOUBLE PRECISION, INTENT(IN)for vdcos

COMPLEX, INTENT(IN) for vccos

DOUBLE PRECISION, INTENT(IN)for vzcos

C: const float* for vsCos

const double* for vdCos

C: const MKL_Complex8* forvcCos

const MKL_Complex16* for vzCos

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vscos

DOUBLE PRECISION for vdcos

y


COMPLEX for vccos

2228


DescriptionTypeName

DOUBLE COMPLEX for vzcos

C: float* for vsCos

double* for vdCos

C: MKL_Complex8* for vcCos

MKL_Complex16* for vzCos

SinComputes sine of vector elements.

Syntax

Fortran:

call vssin( n, a, y )

call vdsin( n, a, y )

call vcsin( n, a, y )

call vzsin( n, a, y )

C:

call vsSin( n, a, y );

call vdSin( n, a, y );

call vcSin( n, a, y );

call vzSin( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2229


DescriptionTypeName


FORTRAN: REAL, INTENT(IN) forvssin

a


DOUBLE PRECISION, INTENT(IN)for vdsin

COMPLEX, INTENT(IN) for vcsin

DOUBLE PRECISION, INTENT(IN)for vzsin

C: const MKL_Complex8* forvcSin

const MKL_Complex16* for vzSin

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vssin

DOUBLE PRECISION for vdsin

y


COMPLEX for vcsin

DOUBLE COMPLEX for vzsin

C: float* for vsSin

double* for vdSin

C: MKL_Complex8* for vcSin

MKL_Complex16* for vzSin

2230


SinCosComputes sine and cosine of vector elements.

Syntax

Fortran:

call vssincos( n, a, y, z )

call vdsincos( n, a, y, z )

C:

vsSinCos( n, a, y, z );

vdSinCos( n, a, y, z );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvssincos

a


DOUBLE PRECISION, INTENT(IN)for vdsincos

C: const float* for vsSinCos

const double* for vdSinCos

Output Parameters

DescriptionTypeName

FORTRAN: Arrays, specify the outputvectors y (for sine values) and z (for cosinevalues).

FORTRAN: REAL for vssincos

DOUBLE PRECISION for vdsincos

C: float* for vsSinCos

y, z

2231


DescriptionTypeName

C: Pointers to arrays that contain the outputvectors y (for sinevalues) and z(for cosinevalues).

double* for vdSinCos

TanComputes tangent of vector elements.

Syntax

Fortran:

call vstan( n, a, y )

call vdtan( n, a, y )

call vctan( n, a, y )

call vztan( n, a, y )

C:

call vsTan( n, a, y );

call vdTan( n, a, y );

call vcTan( n, a, y );

call vzTan( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvstan

a


DOUBLE PRECISION, INTENT(IN)for vdtan

2232


DescriptionTypeName

COMPLEX, INTENT(IN) for vctan

DOUBLE COMPLEX, INTENT(IN) forvztan

C: const float* for vsTan

const double* for vdTan

C: const MKL_Complex8* forvcTan

const MKL_Complex16* for vzTan

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vstan

DOUBLE PRECISION for vdtan

y


COMPLEX for vctan

DOUBLE COMPLEX for vztan

C: float* for vsTan

double* for vdTan

MKL_Complex8* for vcTan

MKL_Complex16* for vzTan

2233


AcosComputes inverse cosine of vector elements.

Syntax

Fortran:

call vsacos( n, a, y )

call vdacos( n, a, y )

call vcacos( n, a, y )

call vzacos( n, a, y )

C:

call vsAcos( n, a, y );

call vdAcos( n, a, y );

call vcAcos( n, a, y );

call vzAcos( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsacos

a


DOUBLE PRECISION, INTENT(IN)for vdacos

COMPLEX, INTENT(IN) for vcacos

DOUBLE COMPLEX, INTENT(IN) forvzacos

C: const float* for vsAcos

const double* for vdAcos

2234


DescriptionTypeName

const MKL_Complex8* for vcAcos

const MKL_Complex16* forvzAcos

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsacos

DOUBLE PRECISION for vdacos

y


COMPLEX for vcacos

DOUBLE COMPLEX for vzacos

C: float* for vsAcos

double* for vdAcos

MKL_Complex8* for vcAcos

MKL_Complex16* for vzAcos

AsinComputes inverse sine of vector elements.

Syntax

Fortran:

call vsasin( n, a, y )

call vdasin( n, a, y )

call vcasin( n, a, y )

call vzasin( n, a, y )

2235


C:

call vsAsin( n, a, y );

call vdAsin( n, a, y );

call vcAsin( n, a, y );

call vzAsin( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsasin

a


DOUBLE PRECISION, INTENT(IN)for vdasin

COMPLEX, INTENT(IN) for vcasin

DOUBLE COMPLEX, INTENT(IN) forvzasin

C: const float* for vsAsin

const double* for vdAsin

const MKL_Complex8* for vcAsin

const MKL_Complex16* forvzAsin

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsasin

DOUBLE PRECISION for vdasin

y


COMPLEX for vcasin

2236


DescriptionTypeName

DOUBLE COMPLEX for vzasin

C: float* for vsAsin

double* for vdAsin

MKL_Complex8* for vcAsin

MKL_Complex16* for vzAsin

AtanComputes inverse tangent of vector elements.

Syntax

Fortran:

call vsatan( n, a, y )

call vdatan( n, a, y )

call vcatan( n, a, y )

call vzatan( n, a, y )

C:

call vsAtan( n, a, y );

call vdAtan( n, a, y );

call vcAtan( n, a, y );

call vzAtan( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2237


DescriptionTypeName


FORTRAN: REAL, INTENT(IN) forvsatan

a


DOUBLE PRECISION, INTENT(IN)for vdatan

COMPLEX, INTENT(IN) for vcatan

DOUBLE COMPLEX, INTENT(IN) forvzatan

C: const float* for vsAtan

const double* for vdAsin

const MKL_Complex8* for vcAtan

const MKL_Complex16* forvzAsin

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsatan

DOUBLE PRECISION for vdatan

y


COMPLEX for vcatan

DOUBLE COMPLEX for vzatan

C: float* for vsAtan

double* for vdAtan

MKL_Complex8* for vcAtan

MKL_Complex16* for vzAtan

2238


Atan2Computes four-quadrant inverse tangent ofelements of two vectors.

Syntax

Fortran:

call vsatan2( n, a, b, y )

call vdatan2( n, a, b, y )

C:

vsAtan2( n, a, b, y );

vdAtan2( n, a, b, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsatan2

a, b


DOUBLE PRECISION, INTENT(IN)for vdatan2

C: const float* for vsAtan2

const double* for vdAtan2

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsatan2

DOUBLE PRECISION for vdatan2

y

2239


DescriptionTypeName


C: float* for vsAtan2

double* for vdAtan2

Description

The elements of the output vector y are computed as the four-quadrant arctangent of a[i]/b[i].

CoshComputes hyperbolic cosine of vector elements.

Syntax

Fortran:

call vscosh( n, a, y )

call vdcosh( n, a, y )

call vccosh( n, a, y )

call vzcosh( n, a, y )

C:

call vsCosh( n, a, y );

call vdCosh( n, a, y );

call vcCosh( n, a, y );

call vzCosh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2240


DescriptionTypeName


FORTRAN: REAL, INTENT(IN) forvscosh

a


DOUBLE PRECISION, INTENT(IN)for vdcosh

COMPLEX, INTENT(IN) for vccosh

DOUBLE COMPLEX, INTENT(IN) forvzcosh

C: const float* for vsCosh

const double* for vdCosh

const MKL_Complex8* for vcCosh

const MKL_Complex16* forvzCosh

Table 9-8 Precision Overflow Thresholds for Cosh Real Function


-Ln(FLT_MAX)-Ln2 <a[i] < Ln(FLT_MAX)+Ln2single precision

-Ln(DBL_MAX)-Ln2 <a[i] < Ln(DBL_MAX)+Ln2double precision

NOTE. Overflow can occur also in Cosh complex function, but the exact formula isbeyond the scope of this document.

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vscosh

DOUBLE PRECISION for vdcosh

y


COMPLEX for vccosh

DOUBLE COMPLEX for vzcosh

C: float* for vsCosh

2241

DescriptionTypeName

double* for vdCosh

MKL_Complex8* for vcCosh

MKL_Complex16* for vzCosh

SinhComputes hyperbolic sine of vector elements.

Syntax

Fortran:

call vssinh( n, a, y )

call vdsinh( n, a, y )

call vcsinh( n, a, y )

call vzsinh( n, a, y )

C:

call vsSinh( n, a, y );

call vdSinh( n, a, y );

call vcSinh( n, a, y );

call vzSinh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvssinh

a


DOUBLE PRECISION, INTENT(IN)for vdsinh

2242


DescriptionTypeName

COMPLEX, INTENT(IN) for vcsinh

DOUBLE COMPLEX, INTENT(IN) forvzsinh

C: const float* for vsSinh

const double* for vdSinh

const MKL_Complex8* for vcSinh

const MKL_Complex16* forvzSinh

Table 9-9 Precision Overflow Thresholds for Sinh Real Function


-Ln(FLT_MAX)-Ln2 <a[i] < Ln(FLT_MAX)+Ln2single precision

-Ln(DBL_MAX)-Ln2 <a[i] < Ln(DBL_MAX)+Ln2double precision

NOTE. Overflow can occur also in Sinh complex function, but the exact formula isbeyond the scope of this document.

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vssinh

DOUBLE PRECISION for vdsinh

y


COMPLEX for vcsinh

DOUBLE COMPLEX for vzsinh

C: float* for vsSinh

double* for vdSinh

MKL_Complex8* for vcSinh

MKL_Complex16* for vzSinh

2243

TanhComputes hyperbolic tangent of vector elements.

Syntax

Fortran:

call vstanh( n, a, y )

call vdtanh( n, a, y )

call vctanh( n, a, y )

call vztanh( n, a, y )

C:

call vsTanh( n, a, y );

call vdTanh( n, a, y );

call vcTanh( n, a, y );

call vzTanh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvstanh

a


DOUBLE PRECISION, INTENT(IN)for vdtanh

COMPLEX, INTENT(IN) for vctanh

DOUBLE COMPLEX, INTENT(IN) forvztanh

C: const float* for vsTanh

const double* for vdTanh

2244


DescriptionTypeName

const MKL_Complex8* for vcTanh

const MKL_Complex16* forvzTanh

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vstanh

DOUBLE PRECISION for vdtanh

y


COMPLEX for cstanh

DOUBLE COMPLEX for zdtanh

C: float* for vsTanh

double* for vdTanh

MKL_Complex8* for vcTanh

MKL_Complex16* for vzTanh

AcoshComputes inverse hyperbolic cosine (nonnegative)of vector elements.

Syntax

Fortran:

call vsacosh( n, a, y )

call vdacosh( n, a, y )

call vcacosh( n, a, y )

call vzacosh( n, a, y )

2245


C:

call vsAcosh( n, a, y );

call vdAcosh( n, a, y );

call vcAcosh( n, a, y );

call vzAcosh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsacosh

a


DOUBLE PRECISION, INTENT(IN)for vdacosh

COMPLEX, INTENT(IN) for vcacosh

DOUBLE COMPLEX, INTENT(IN) forvzacosh

C: const float* for vsAcosh

const double* for vdAcosh

const MKL_Complex8* forvcAcosh

const MKL_Complex16* forvzAcosh

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsacosh

DOUBLE PRECISION for vdacosh

y

2246


DescriptionTypeName


COMPLEX for vcacosh

DOUBLE COMPLEX for vzacosh

C: float* for vsAcosh

double* for vdAcosh

MKL_Complex8* for vcAcosh

MKL_Complex16* for vzAcosh

AsinhComputes inverse hyperbolic sine of vectorelements.

Syntax

Fortran:

call vsasinh( n, a, y )

call vdasinh( n, a, y )

call vcasinh( n, a, y )

call vzasinh( n, a, y )

C:

call vsAsinh( n, a, y );

call vdAsinh( n, a, y );

call vcAsinh( n, a, y );

call vzAsinh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2247


DescriptionTypeName


FORTRAN: REAL, INTENT(IN) forvsasinh

a


DOUBLE PRECISION, INTENT(IN)for vdasinh

COMPLEX, INTENT(IN) for vcasinh

DOUBLE COMPLEX, INTENT(IN) forvzasinh

C: const float* for vsAsinh

const double* for vdAsinh

const MKL_Complex8* forvcAsinh

const MKL_Complex16* forvzAsinh

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsasinh

DOUBLE PRECISION for vdasinh

y


COMPLEX for vcasinh

DOUBLE COMPLEX for vzasinh

C: float* for vsAsinh

double* for vdAsinh

MKL_Complex8* for vcAsinh

MKL_Complex16* for vzAsinh

2248


AtanhComputes inverse hyperbolic tangent of vectorelements.

Syntax

Fortran:

call vsatanh( n, a, y )

call vdatanh( n, a, y ) call vcatanh( n, a, y ) call vzatanh( n, a, y )

C:

call vsAtanh( n, a, y );

call vdAtanh( n, a, y ); call vcAtanh( n, a, y ); call vzAtanh( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsatanh

a


DOUBLE PRECISION, INTENT(IN)for vdatanh

COMPLEX, INTENT(IN) for vcatanh

DOUBLE COMPLEX, INTENT(IN) forvzatanh

C: const float* for vsAtanh

const double* for vdAtanh

const MKL_Complex8* forvcAtanh

const MKL_Complex16* forvzAtanh

2249


Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsatanh

DOUBLE PRECISION for vdatanh

y


COMPLEX for vcatanh

DOUBLE COMPLEX for vzatanh

C: float* for vsAtanh

double* for vdAtanh

MKL_Complex8* for vcAtanh

MKL_Complex16* for vzAtanh

ErfComputes the error function value of vectorelements.

Syntax

Fortran:

call vserf( n, a, y )

call vderf( n, a, y )

C:

call vsErf( n, a, y );

call vdErf( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n

2250


DescriptionTypeName


FORTRAN: REAL, INTENT(IN) forvserf

a


DOUBLE PRECISION, INTENT(IN)for vderf

C: const float* for vsErf

const double* for vdErf

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vserf

DOUBLE PRECISION for vderf

y


C: float* for vsErf

double* for vdErf

Description

The function Erf computes the error function values for elements of the input vector a andwrites them to the output vector y.

The error function is defined as given by:

2251


ErfcComputes the complementary error function valueof vector elements.

Syntax

Fortran:

call vserfc( n, a, y )

call vderfc( n, a, y )

C:

call vsErfc( n, a, y );

call vdErfc( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvserfc

a


DOUBLE PRECISION, INTENT(IN)for vderfc

C: const float* for vsErfc

const double* for vdErfc

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vserfc

DOUBLE PRECISION for vderfc

y

2252


DescriptionTypeName


C: float* for vsErfc

double* for vdErfc

Description

The function Erfc computes the complimentary error function values for elements of the inputvector a and writes them to the output vector y.

The error function is defined as given by:

erfc(x) = 1 - erf(x)

or, in other words,

ErfInvComputes inverse error function value of vectorelements.

Syntax

Fortran:

call vserfinv( n, a, y )

call vderfinv( n, a, y )

C:

call vsErfInv( n, a, y );

call vdErfInv( n, a, y );

2253


Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvserfinv

a


DOUBLE PRECISION, INTENT(IN)for vderfinv

C: const float* for vsErfInv

const double* for vdErfInv

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vserfinv

DOUBLE PRECISION for vderfinv

y


C: float* for vsErfInv

double* for vdErfInv

Description

The function ErfInv computes the inverse error function values for elements of the input vectora and writes them to the output vector y.

The inverse error function is defined as given by:

erfinv(x) = erf-1(x),

where erf(x) denotes the error function defined as given by

2254


FloorComputes a rounded towards minus infinity integervalue for each vector element.

Syntax

Fortran:

call vsfloor( n, a, y )

call vdfloor( n, a, y )

C:

call vsFloor( n, a, y );

call vdFloor( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsfloor

a


DOUBLE PRECISION, INTENT(IN)for vdfloor

C: const float* for vsFloor

const double* for vdFloor

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsfloor

DOUBLE PRECISION for vdfloor

y

2255


DescriptionTypeName


C: float* for vsFloor

double* for vdFloor

CeilComputes a rounded towards plus infinity integervalue for each vector element.

Syntax

Fortran:

call vsceil( n, a, y )

call vdceil( n, a, y )

C:

call vsCeil( n, a, y );

call vdCeil( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsceil

a


DOUBLE PRECISION, INTENT(IN)for vdceil

C: const float* for vsCeil

const double* for vdCeil

2256


Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsceil

DOUBLE PRECISION for vdceil

y


C: float* for vsCeil

double* for vdCeil

TruncComputes a rounded towards zero integer valuefor each vector element.

Syntax

Fortran:

call vstrunc( n, a, y )

call vdtrunc( n, a, y )

C:

call vsTrunc( n, a, y );

call vdTrunc( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvstrunc

a


DOUBLE PRECISION, INTENT(IN)for vdtrunc

C: const float* for vsTrunc

2257


DescriptionTypeName

const double* for vdTrunc

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vstrunc

DOUBLE PRECISION for vdtrunc

y


C: float* for vsTrunc

double* for vdTrunc

RoundComputes a rounded to nearest integer value foreach vector element.

Syntax

Fortran:

call vsround( n, a, y )

call vdround( n, a, y )

C:

call vsRound( n, a, y );

call vdRound( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsround

a

2258


DescriptionTypeName


DOUBLE PRECISION, INTENT(IN)for vdround

C: const float* for vsRound

const double* for vdRound

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsround

DOUBLE PRECISION for vdround

y


C: float* for vsRound

double* for vdRound

Description

Halfway values, that is, 0.5, -1.5, and the like, are rounded off away from zero. That is, 0.5-> 1, -1.5 -> -2, etc.

NearbyIntComputes a rounded integer value in a currentrounding mode for each vector element.

Syntax

Fortran:

call vsnearbyint( n, a, y )

call vdnearbyint( n, a, y )

C:

call vsNearbyInt( n, a, y );

call vdNearbyInt( n, a, y );

2259


Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsnearbyint

a


DOUBLE PRECISION, INTENT(IN)for vdnearbyint

C: const float* for vsNearbyInt

const double* for vdNearbyInt

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsnearbyint

DOUBLE PRECISION forvdnearbyint

y


C: float* for vsNearbyInt

double* for vdNearbyInt

Description

Halfway values, that is, 0.5, -1.5, and the like, are rounded off towards even values.

2260


RintComputes a rounded integer value in a currentrounding mode for each vector element withinexact result exception raised for each changedvalue.

Syntax

Fortran:

call vsrint( n, a, y )

call vdrint( n, a, y )

C:

call vsRint( n, a, y );

call vdRint( n, a, y );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsrint

a


DOUBLE PRECISION, INTENT(IN)for vdrint

C: const float* for vsRint

const double* for vdRint

Output Parameters

DescriptionTypeName


FORTRAN: REAL for vsrint

DOUBLE PRECISION for vdrint

y

2261


DescriptionTypeName


C: float* for vsRint

double* for vdRint

Description

Halfway values, that is, 0.5, -1.5, and the like, are rounded off towards even values. For eachchanged value, inexact result exception is raised.

ModfComputes a truncated integer value and remainingfraction part for each vector element.

Syntax

Fortran:

call vsmodf( n, a, y, z )

call vdmodf( n, a, y, z )

C:

call vsModf( n, a, y, z );

call vdModf( n, a, y, z );

Input Parameters

DescriptionTypeName



C: int.

n


FORTRAN: REAL, INTENT(IN) forvsmodf

a


DOUBLE PRECISION, INTENT(IN)for vdmodf

C: const float* for vsModf

2262


DescriptionTypeName

const double* for vdModf

Output Parameters

DescriptionTypeName

FORTRAN: Array, specifies the outputvector y and z.

FORTRAN: REAL for vsmodf

DOUBLE PRECISION for vdmodf

y, z

C: Pointer to an array that contains theoutput vector y and z.

C: float* for vsModf

double* for vdModf

VML Pack/Unpack FunctionsThis section describes VML functions which convert vectors with unit increment to and fromvectors with positive increment indexing, vector indexing and mask indexing (see Appendix Bfor details on vector indexing methods).

Table 9-10 lists available VML Pack/Unpack functions, together with data types and indexingmethods associated with them.

Table 9-10 VML Pack/Unpack Functions

DescriptionIndexingMethods

DataTypes

Function ShortName

Gathers elements of arrays, indexed by differentmethods.

I,V,Ms, dPack

Scatters vector elements to arrays with different indexing.I,V,Ms, dUnpack

2263


PackCopies elements of an array with specified indexingto a vector with unit increment.

Syntax

Fortran:

call vsPackI( n, a, inca, y )

call vsPackV( n, a, ia, y )

call vsPackM( n, a, ma, y )

call vdPackI( n, a, inca, y )

call vdPackV( n, a, ia, y )

call vdPackM( n, a, ma, y )

C:

vsPackI( n, a, inca, y );

vsPackV( n, a, ia, y );

vsPackM( n, a, ma, y );

vdPackI( n, a, inca, y );

vdPackV( n, a, ia, y );

vdPackM( n, a, ma, y );

Input Parameters

DescriptionTypeName



C: int.

n

FORTRAN: Array, DIMENSION at least(1 +(n-1)*inca) for vspacki/vdpacki,

FORTRAN: REAL, INTENT(IN) forvspacki, vspackv, vspackm

a

2264


DescriptionTypeName

Array, DIMENSION at least max(n,max(ia[j]) ), j=0, …, n-1 forvspackv/vdpackv ,

DOUBLE PRECISION, INTENT(IN)for vdpacki, vdpackv, vdpackm.

C: const float* for vsPackI,vsPackV, vsPackM Array, DIMENSION at least n for

vspackm/vdpackm.const double* for vdPackI,vdPackV, vdPackM Specifies the input vector a.

C: Specifies pointer to an array that containsthe input vector a. Size of the array mustbe:

at least(1 + (n-1)*inca) forvsPackI/vdPackI,

at least max( n,max(ia[j]) ), j=0, …,n-1 for vsPackV/vdPackV ,

at least n for vsPackM/vdPackM.

Specifies the increment for the elements ofa.

FORTRAN: INTEGER, INTENT(IN)for vspacki, vdpacki.

C: int for vsPackI, vdPackI.

inca

FORTRAN: Array, DIMENSION at least n.FORTRAN: INTEGER, INTENT(IN)for vspackv, vdpackv.

ia

Specifies the index vector for the elementsof a.C: const int* for vsPackV,

vdPackV. C: Specifies the pointer to an array of sizeat least n that contains the index vector forthe elements of a.

FORTRAN: Array, DIMENSION at least n,FORTRAN: INTEGER, INTENT(IN)for vspackm, vdpackm.

ma

Specifies the mask vector for the elementsof a.C: const int* for vsPackM,

vdPackM. C: Specifies the pointer to an array of sizeat least n that contains the mask vector forthe elements of a.

2265


Output Parameters

DescriptionTypeName

FORTRAN: Array, DIMENSION at least n.Specifies the output vector y.

FORTRAN: REAL for vspacki,vspackv, vspackm

y

C: Pointer to an array of size at least n thatcontains the output vector y.

DOUBLE PRECISION for vdpacki,vdpackv, vdpackm.

C: float* for vsPackI, vsPackV,vsPackM

double* for vdPackI, vdPackV,vdPackM

UnpackCopies elements of a vector with unit increment toan array with specified indexing.

Syntax

Fortran:

call vsunpacki( n, a, y, incy )

call vsunpackv( n, a, y, iy )

call vsunpackm( n, a, y, my )

call vdunpacki( n, a, y, incy )

call vdunpackv( n, a, y, iy )

call vdunpackm( n, a, y, my )

2266


C:

vsUnpackI( n, a, y, incy );

vsUnpackV( n, a, y, iy );

vsUnpackM( n, a, y, my );

vdUnpackI( n, a, y, incy );

vdUnpackV( n, a, y, iy );

vdUnpackM( n, a, y, my );

Input Parameters

DescriptionTypeName



C: int.

n

FORTRAN: Array, DIMENSION at least n.FORTRAN: REAL, INTENT(IN) forvsunpacki, vsunpackv,vsunpackm

a

Specifies the input vector a.

C: Specifies the pointer to an array of sizeat least n that contains the input vector a.DOUBLE PRECISION, INTENT(IN)

for vdunpacki, vdunpackv,vdunpackm.

C: const float* for vsUnpackI,vsUnpackV, vsUnpackM

const double* for vdUnpackI,vdUnpackV, vdUnpackM

Specifies the increment for the elements ofy.

FORTRAN: INTEGER, INTENT(IN)for vsunpacki, vdunpacki.

C: int for vsUnpackI, vdUnpackI.

incy

FORTRAN: Array, DIMENSION at least n.FORTRAN: INTEGER, INTENT(IN)for vsunpackv, vdunpackv.

iy

Specifies the index vector for the elementsof y.C: const int* for vsUnpackV,

vdUnpackV.

2267


DescriptionTypeName

C: Specifies the pointer to an array of sizeat least n that contains the index vector forthe elements of a.

FORTRAN: Array, DIMENSION at least n,FORTRAN: INTEGER, INTENT(IN)for vsunpackm, vdunpackm.

my

Specifies the mask vector for the elementsof y.C: const int* for vsUnpackM,

vdUnpackM. C: Specifies the pointer to an array of sizeat least n that contains the mask vector forthe elements of a.

Output Parameters

DescriptionTypeName

FORTRAN: Array, DIMENSIONFORTRAN: REAL for vsunpacki,vsunpackv, vsunpackm

y

at least (1 + (n-1)*incy) for vsunpacki,DOUBLE PRECISION forvdunpacki, vdunpackv,vdunpackm.

at least max( n,max(iy[j]) ),j=0,...,n-1, for vsunpackv,

at least n for vsunpackm/vdunpackmC: float* for vsUnpackI,vsUnpackV, vsUnpackM C: Specifies the pointer to an array that

contains the output vector y.double* for vdUnpackI,vdUnpackV, vdUnpackM Size of the array must be:

at least (1 + (n-1)*incy) forvsUnPackI,

at least max( n,max(ia[j])),j=0,..., n-1, for vsUnPackV,

at least n for vsUnPackM.

2268


VML Service FunctionsVML Service functions allow the user to set /get the accuracy mode, and set/get the error code.All these functions are available both in Fortran- and C- interfaces. Table 9-11 lists availableVML Service functions and their short description.

Table 9-11 VML Service Functions

DescriptionFunction Short Name

Sets the VML modeSetModeGets the VML modeGetModeSets the VML error statusSetErrStatusGets the VML error statusGetErrStatusClears the VML error statusClearErrStatusSets the additional error handler callback functionSetErrorCallBackGets the additional error handler callback functionGetErrorCallBackDeletes the additional error handler callback functionClearErrorCallBack

SetModeSets a new mode for VML functions according tomode parameter and stores the previous VML modeto oldmode.

Syntax

Fortran:

oldmode = vmlsetmode( mode )

C:

oldmode = vmlSetMode( mode );

Input Parameters

DescriptionTypeName

Specifies the VML mode to be set.FORTRAN: INTEGER, INTENT(IN).

C: int.

mode

2269


Output Parameters

DescriptionTypeName

Specifies the former VML mode.FORTRAN: INTEGER.

C: int.

oldmode

Description

The mode parameter is designed to control accuracy, FPU, error handling and threading options.Table 9-12 lists values of the mode parameter. All other possible values of the mode parametermay be obtained from these values by using bitwise OR ( | ) operation to combine one valuefor accuracy, one for FPU, and one for error control options. The default value of the modeparameter is VML_HA | VML_ERRMODE_DEFAULT | VML_NUM_THREADES_OMP_AUTO. Thus, thecurrent FPU control word (FPU precision and the rounding method) is used by default.

If any VML mathematical function requires different FPU precision, or rounding method, itchanges these options automatically and then restores the former values. The mode parameterenables you to minimize switching the internal FPU mode inside each VML mathematical functionthat works with similar precision and accuracy settings. To accomplish this, set the modeparameter to VML_FLOAT_CONSISTENT for single precision real and complex functions, or toVML_DOUBLE_CONSISTENT for double precision real and complex functions. These values of themode parameter are the optimal choice for the respective function groups, as they are requiredfor most of the VML mathematical functions. After the execution is over, set the mode toVML_RESTORE if you need to restore the previous FPU mode.

Table 9-12 Values of the mode Parameter

DescriptionValue of mode

Accuracy Control

High accuracy versions of VML functions will be usedVML_HA

Low accuracy versions of VML functions will be usedVML_LA

Additional FPU Mode Control

The optimal FPU mode (control word) for single precisionfunctions is set, and the previous FPU mode is saved

VML_FLOAT_CONSISTENT

The optimal FPU mode (control word) for double precisionfunctions is set, and the previous FPU mode is saved

VML_DOUBLE_CONSISTENT

The previously saved FPU mode is restoredVML_RESTORE

Error Mode Control

2270


DescriptionValue of mode

No action is set for computation errorsVML_ERRMODE_IGNORE

On error, the errno variable is setVML_ERRMODE_ERRNO

On error, the error text information is written to stderrVML_ERRMODE_STDERR

On error, an exception is raisedVML_ERRMODE_EXCEPT

On error, an additional error handler function is calledVML_ERRMODE_CALLBACK

On error, the errno variable is set, an exception is raised,

and an additional error handler function is called.

VML_ERRMODE_DEFAULT

Treading Mode Control

This is default behavior. Maximum number of threads isdetermined by environmental variable OMP_NUM_THREADSand can be overridden by OpenMP* function

omp_set_num_threads(). For performance reasons VML

threading logic can use fewer number of threads.

VML_NUM_THREADS_OMP_AUTO

Number of threads is determined by environmental variableOMP_NUM_THREADS and can be overridden by OpenMP*

function omp_set_num_threads(). Use this mode to

disable VML threading logic.

VML_NUM_THREADS_OMP_FIXED

Examples

Several examples of calling the function vmlSetMode() with different values of the modeparameter are given below:

Fortran:

oldmode = vmlsetmode( VML_LA )

call vmlsetmode( IOR(VML_LA, IOR(VML_FLOAT_CONSISTENT,VML_ERRMODE_IGNORE )))

call vmlsetmode( VML_RESTORE)

call vmlsetmode( VML_NUM_THREADS_OMP_FIXED)

2271


C:

vmlSetMode( VML_LA );

vmlSetMode( VML_LA | VML_FLOAT_CONSISTENT |VML_ERRMODE_IGNORE );

vmlSetMode( VML_RESTORE);

vmlSetMode( VML_NUM_THREADS_OMP_FIXED);

GetModeGets the VML mode.

Syntax

Fortran:

mod = vmlgetmode()

C:

mod = vmlGetMode( void );

Output Parameters

DescriptionTypeName

Specifies the packed mode parameter.FORTRAN: INTEGER.

C: int.

mod

Description

The function vmlGetMode() returns the VML mode parameter which controls accuracy, FPUanderror handling options. The mod variable value is some combination of the values listed in theTable 9-12 . You can obtain some of these values using the respective mask from the Table9-13

Table 9-13 Values of Mask for the mode Parameter

DescriptionValue of mask

Specifies mask for accuracy mode selection.VML_ACCURACY_MASK

Specifies mask for FPU mode selection.VML_FPUMODE_MASK

2272


DescriptionValue of mask

Specifies mask for error mode selection.VML_ERRMODE_MASK

See example below:

Examples

Fortran:

mod = vmlgetmode()

accm = IAND(mod, VML_ACCURACY_MASK)

fpum = IAND(mod, VML_FPUMODE_MASK)

errm = IAND(mod, VML_ERRMODE_MASK)

C:

accm = vmlGetMode(void )& VML_ACCURACY_MASK;

fpum = vmlGetMode(void )& VML_FPUMODE_MASK;

errm = vmlGetMode(void )& VML_ERRMODE_MASK;

SetErrStatusSets the new VML error status according to errand stores the previous VML error status to olderr.

Syntax

Fortran:

olderr = vmlseterrstatus( err )

C:

olderr = vmlSetErrStatus( err );

2273


Input Parameters

DescriptionTypeName

Specifies the VML error status to be set.FORTRAN: INTEGER, INTENT(IN).

C: int.

err

Output Parameters

DescriptionTypeName

Specifies the former VML error status.FORTRAN: INTEGER.

C: int.

olderr

Table 9-14 lists possible values of the err parameter.

Table 9-14 Values of the VML Error Status

DescriptionError Status

The execution was completed successfully.VML_STATUS_OK

The array dimension is not positive.VML_STATUS_BADSIZE

NULL pointer is passed.VML_STATUS_BADMEM

At least one of array values is out of a range ofdefinition.

VML_STATUS_ERRDOM

At least one of array values caused a singularity.VML_STATUS_SING

An overflow has happened during the calculationprocess.

VML_STATUS_OVERFLOW

An underflow has happened during the calculationprocess.

VML_STATUS_UNDERFLOW

ExamplesvmlSetErrStatus( VML_STATUS_OK );

vmlSetErrStatus( VML_STATUS_ERRDOM );

vmlSetErrStatus( VML_STATUS_UNDERFLOW );

2274


GetErrStatusGets the VML error status.

Syntax

Fortran:

err = vmlgeterrstatus( )

C:

err = vmlGetErrStatus( void );

Output Parameters

DescriptionTypeName

Specifies the VML error status.FORTRAN: INTEGER.

C: int.

err

ClearErrStatusSets the VML error status to VML_STATUS_OK andstores the previous VML error status to olderr.

Syntax

Fortran:

olderr = vmlclearerrstatus( )

C:

olderr = vmlClearErrStatus( void );

Output Parameters

DescriptionTypeName

Specifies the former VML error status.FORTRAN: INTEGER.

C: int.

olderr

2275


SetErrorCallBackSets the additional error handler callback functionand gets the old callback function.

Syntax

Fortran:

oldcallback = vmlseterrorcallback( callback )

C:

oldcallback = vmlSetErrorCallBack( callback );

Input Parameters

DescriptionTypeName

FORTRAN: The callback function has thefollowing format:

INTEGER FUNCTION ERRFUNC(par)

TYPE (ERROR_STRUCTURE) par

! ...

! user error processing

! ...

ERRFUNC = 0

! if ERRFUNC= 0 - standard VMLerror handler

! is called after the callback

! if ERRFUNC != 0 - standard VMLerror handler

! is not called

END

FORTRAN: Address of the callbackfunction.

callback

2276


DescriptionTypeName

The passed error structure is defined asfollows:

TYPE ERROR_STRUCTURE SEQUENCE

INTEGER*4 ICODE

INTEGER*4 IINDEX

REAL*8 DBA1

REAL*8 DBA2

REAL*8 DBR1

REAL*8 DBR2

CHARACTER(64) CFUNCNAME

INTEGER*4 IFUNCNAMELEN

END TYPE ERROR_STRUCTURE

C: The callback function has the followingformat:

static int __stdcallMyHandler(DefVmlErrorContext*

pContext)

{

/* Handler body */

};

C: Pointer to the callback function.callback

2277


DescriptionTypeName

The passed error structure is defined asfollows:

typedef struct _DefVmlErrorContext

{

int iCode;/* Error status value */

int iIndex;/* Index for bad array

element, or bad array

dimension, or bad

array pointer */

double dbA1; /* Error argument 1 */

double dbA2; /* Error argument 2 */

double dbR1; /* Error result 1 */

double dbR2; /* Error result 2 */

char cFuncName[64]; /* Functionname */

int iFuncNameLen; /* Length offunctionname*/

} DefVmlErrorContext;

Output Parameters

DescriptionTypeName

FORTRAN: Address of the former callbackfunction.

FORTRAN: INTEGER

C: int

oldcallback

C: Pointer to the former callback function.

Description

The callback function is called on each VML mathematical function error ifVML_ERRMODE_CALLBACK error mode is set (see Table 9-12 ).

2278


Use the vmlSetErrorCallBack() function if you need to define your own callback functioninstead of default empty callback function.

The input structure for a callback function contains the following information about the errorencountered:

• the input value that caused an error

• location (array index) of this value

• the computed result value

• error code

• name of the function in which the error occurred.

You can insert your own error processing into the callback function. This may include correctingthe passed result values in order to pass them back and resume computation. The standarderror handler is called after the callback function only if it returns 0.

GetErrorCallBackGets the additional error handler callback function.

Syntax

Fortran:

callback = vmlgeterrorcallback( )

C:

callback = vmlGetErrorCallBack( void );

Output Parameters

DescriptionTypeName

FORTRAN: Address of the callback function.callback

C: Pointer to the callback function.

2279


ClearErrorCallBackDeletes the additional error handler callbackfunction and retrieves the former callback function.

Syntax

Fortran:

oldcallback = vmlclearerrorcallback( )

C:

oldcallback = vmlClearErrorCallBack( void );

Output Parameters

DescriptionTypeName

FORTRAN: Address of the former callbackfunction.

FORTRAN: INTEGER.

C: int.

oldcallback

C: Pointer to the former callback function.

2280


10Statistical Functions

Statistical functions in Intel® MKL are known as Vector Statistical Library (VSL) that is designed for thepurpose of

• generating vectors of pseudorandom and quasi-random numbers

• performing mathematical operations of convolution and correlation.

The corresponding functionality is described in the respective Random Number Generators and Convolutionand Correlation sections.

Random Number GeneratorsVSL provides a set of routines implementing commonly used pseudo- or quasi-random numbergenerators with continuous and discrete distribution. To speed up performance, all these routineswere developed using the calls to the highly optimized Basic Random Number Generators (BRNGs)and the library of vector mathematical functions (VML, see Chapter 9, “Vector MathematicalFunctions”).

VSL provides interfaces both for FORTRAN and C languages. For users of the C and C++ languagesthe mkl_vsl.h header file is provided. For users of the FORTRAN-90 or FORTRAN-95 language themkl_vsl.fi header file is provided. Both header files are found in the following directory:

${MKL}/include

The mkl_vsl.fi header is intended for using via the FORTRAN include clause and is compatible withboth standard forms of F90/F95 sources — the free and 72-columns fixed forms. If you need to usethe VSL interface with 80- or 132-columns fixed form sources, you may add a new file to your project.That file is formatted as a 72-columns fixed-form source and consists of a single include clause asfollows:

include ‘mkl_vsl.fi’

This include clause causes the compiler to generate the module files mkl_vsl.mod andmkl_vsl_type.mod, which are used to process the FORTRAN use clauses referencing to the VSLinterface:

use mkl_vsl_type

use mkl_vsl

2281

Because of this specific feature, you do not need to include the mkl_vsl.fi header into eachsource of your project. You only need to include the header into some of the sources. In anycase, make sure that the sources that depend on the VSL interface are compiled after thosethat include the header so that the module files mkl_vsl.mod and mkl_vsl_type.mod aregenerated prior to using them.

NOTE. For FORTRAN interface, VSL provides both subroutine-style interface andfunction-style interface. Default interface in this case is a function-style interface. Defaultinterface in this case is a function-style interface. Subroutine-style interface is providedfor backward compatibility only. To use subroutine-style interface, manually includemkl_vsl_subroutine.fi file instead of mkl_vsl.fi by changing the line include‘mkl_vsl.fi’ in include\mkl.fi with the line include ‘mkl_vsl_subroutine.fi’.

Function-style interface, unlike subroutine-style interface, allows user to get error status ofeach routine.

All VSL routines can be classified into three major categories:

• Transformation routines for different types of statistical distributions, for example, uniform,normal (Gaussian), binomial, etc. These routines indirectly call basic random numbergenerators, which are either pseudorandom number generators or quasi-random numbergenerators. Detailed description of the generators can be found in Distribution Generatorssection.

• Service routines to handle random number streams: create, initialize, delete, copy, save toa binary file, load from a binary file, get the index of a basic generator. The description ofthese routines can be found in Service Subroutines section.

• Registration routines for basic pseudorandom generators and routines that obtain propertiesof the registered generators (see Advanced Service Subroutines section ).

The last two categories are referred to as service routines.

Conventions

This document makes no specific differentiation between random, pseudorandom, andquasi-random numbers, nor between random, pseudorandom, and quasi-random numbergenerators unless the context requires otherwise. For details, refer to ‘Random Numbers’ sectionin VSL Notes document provided with Intel® MKL.

All generators of nonuniform distributions, both discrete and continuous, are built on the basisof the uniform distribution generators, called Basic Random Number Generators (BRNGs). Thepseudorandom numbers with nonuniform distribution are obtained through an appropriate

2282


transformation of the uniformly distributed pseudorandom numbers. Such transformations arereferred to as generation methods. For a given distribution, several generation methods canbe used. See VSL Notes for the description of methods available for each generator.

The stream descriptor specifies which BRNG should be used in a given transformationmethod. See ‘Random Streams and RNGs in Parallel Computation’ section of VSL Notes.

The term computational node means a logical or physical unit that can process data inparallel.

Mathematical Notation

The following notation is used throughout the text:

The set of natural numbers N = {1, 2, 3 ...}.N

The set of integers Z = {... -3, -2, -1, 0, 1, 2, 3 ...}.Z

The set of real numbers.RThe floor of a (the largest integer less than or equal to a).

Bitwise exclusive OR.⊕ or xor

Binomial coefficient or combination (α∈R, α≥ 0; k∈N ∪{0}).

For α≥k binomial coefficient is defined as

If α < k, then

Cumulative Gaussian distribution functionΦ(x)

2283

Statistical Functions 10

defined over - ∞ < x < + ∞.

Φ(-∞) = 0, Φ(+∞) = 1.

The complete gamma functionΓ(α)

where α > 0.

The complete beta functionB(p, q)

where p>0 and q>0.

Linear Congruential Generator xn+1 = (axn + c) mod m, where a iscalled the multiplier, c is called the increment, and m is called themodulus of the generator.

LCG(a,c, m)

Multiplicative Congruential Generator xn+1 = (axn) mod m is a specialcase of Linear Congruential Generator, where the increment c is takento be 0.

MCG(a,m)

Generalized Feedback Shift Register Generator

xn = xn-p ⊕xn-q.

GFSR(p, q)

Naming Conventions

The names of all VSL functions in FORTRAN are lowercase; names in C may contain bothlowercase and uppercase letters.

The names of generator routines have the following structure:

2284

v<type of result>rng<distribution> for FORTRAN-interface

v<type of result>Rng<distribution> for C-interface,

where v is the prefix of a VSL vector function, and the field <type of result> is either s, d,or i and specifies one of the following types:

REAL for FORTRAN-interfacesfloat for C-interface

DOUBLE PRECISION for FORTRAN-interfaceddouble for C-interface

INTEGER for FORTRAN-interfaceiint for C-interface

Prefixes s and d apply to continuous distributions only, prefix i applies only to discrete case.The prefix rng indicates that the routine is a random generator, and the <distribution> fieldspecifies the type of statistical distribution.

Names of service routines follow the template below:vsl<name>,

where vsl is the prefix of a VSL service function. The field <name> contains a short functionname. For a more detailed description of service routines refer to Service Routines and AdvancedService Routines sections.

Prototype of each generator routine corresponding to a given probability distribution fits thefollowing structure:<function name>( method, stream, n, r, [<distribution parameters >] ),

where

• method is the number specifying the method of generation. A detailed description of thisparameter can be found in Distribution Generators section. See the next page for methodname structure definition.

• stream defines the random stream descriptor and must have a nonzero value. Randomstreams and their usage are discussed further in Random Streams and Service Routines.

• n defines the number of random values to be generated. If n is less than or equal to zero,no values are generated. Furthermore, if n is negative, an error condition is set.

• r defines the destination array for the generated numbers. The dimension of the array mustbe large enough to store at least n random numbers.

Additional parameters included into <distribution parameters> field are individual for eachgenerator routine and are described in detail in Distribution Generators section.

2285

To invoke a distribution generator, use a call to the respective VSL routine. For example, toobtain a vector r, composed of n independent and identically distributed random numbers withnormal (Gaussian) distribution, that have the mean value a and standard deviation sigma, writethe following:

for FORTRAN-interfacestatus = vsrnggaussian( method, stream, n, r, a, sigma )

for C-interfacestatus = vsRngGaussian( method, stream, n, r, a, sigma )

The name of a method parameter has the following structure:

VSL_METHOD_<precision><distribution>_<method>,

VSL_METHOD_<precision><distribution>_<method>_ACCURATE,

where

<precision> for single precision continuous distributionS

for double precision continuous distributionD

for discrete distributionI

probability distribution<distribution>

method name.<method>

Type of name structure for method parameter corresponds to fast and accuarate modes ofrandom number generation (see “Distribution Generators” section and VSL Notes for details).

Method names VSL_METHOD_<precision><distribution>_<method> andVSL_METHOD_<precision><distribution>_<method>_ACCURATE should be used withvsl<precision>Rng<distribution> function only, where

<precision> for single precision continuous distributions

for double precision continuous distributiond

for discrete distributioni

probability distribution.<distribution>

Table 10-1 provides specific predefined values of the method name. The third column containsnames of the functions that use the given method.

2286

Table 10-1 Values of <method> in method parameter

FunctionsShort DescriptionMethod

Uniform(continuous),Uniform(discrete),UniformBits

Standard method. Currently there is only one method forthese functions.

STD

Gaussian,GaussianMV

BOXMULLER generates normally distributed random numberx thru the pair of uniformly distributed numbers u1 and u2according to the formula:

BOXMULLER

Gaussian,GaussianMV

BOXMULLER2 generates normally distributed randomnumbers x1 and x2 thru the pair of uniformly distributednumbers u1 and u2 according to the formulas:

BOXMULLER2

Exponential,Laplace,Weibull,Cauchy,

Inverse cumulative distribution function method.ICDF

Rayleigh,Lognormal,Gumbel,Bernoulli,Geometric,Gaussian,GaussianMV

GammaFor α > 1, a gamma distributed random number is generatedas a cube of properly scaled normal random number; for0.6 ≤α < 1, a gamma distributed random number is

GNORM

generated using rejection from Weibull distribution; for α <

2287

0.6, a gamma distributed random number is obtained usingtransformation of exponential power distribution; for α = 1,gamma distribution is reduced to exponential distribution.

BetaFor min(p, q) > 1, Cheng method is used; for min(p,

q) < 1, Jöhnk method is used, if q + K·p2+ C ≤ 0 (K=0.852..., C=-0.956...) otherwise, Atkinson switching

CJA

algorithm is used; for max(p, q) < 1, method of Jöhnk isused; for min(p, q) < 1, max(p, q)> 1, Atkinsonswitching algorithm is used (CJA stands for the first lettersof Cheng, Jöhnk, Atkinson); for p = 1 or q = 1, inversecumulative distribution function method is used;for p = 1and q = 1, beta distribution is reduced to uniformdistribution.

BinomialAcceptance/rejection method for ntrial·min(p,1 - p)≥30 with decomposition into 4 regions:

BTPE

– 2 parallelograms

– triangle

– left exponential tail

– right exponential tail

HypergeometricAcceptance/rejection method for large mode of distributionwith decomposition into 3 regions:

H2PE

– rectangular



PoissonAcceptance/rejection method for λ≥ 27 with decompositioninto 4 regions:

PTPE

– 2 parallelograms

– triangle


2288

– right exponential tail;

otherwise, table lookup method is used.

Poisson,PoissonV

for λ≥ 1, method based on Poisson inverse CDF

approximation by Gaussian inverse CDF; for λ < 1, tablelookup method is used.

POISNORM

NegBinomialAcceptance/rejection method for ,NBAR

with decomposition into 5 regions:

– rectangular

– 2 trapezoid



Basic Generators

VSL provides the following BRNGs, which differ in speed and other properties:

• the 32-bit multiplicative congruential pseudorandom number generator MCG(1132489760,231 -1) [L’Ecuyer99]

• the 32-bit generalized feedback shift register pseudorandom number generatorGFSR(250,103) [Kirkpatrick81]

• the combined multiple recursive pseudorandom number generator MRG-32k3a [L’Ecuyer99a]

• the 59-bit multiplicative congruential pseudorandom number generator MCG(1313, 259)from NAG Numerical Libraries [NAG]

• Wichmann-Hill pseudorandom number generator (a set of 273 basic generators) from NAGNumerical Libraries [NAG]

2289

• Mersenne Twister pseudorandom number generator MT19937 [Matsumoto98] with periodlength 219937-1 of the produced sequence

• Set of 1024 Mersenne Twister pseudorandom number generators MT2203 [Matsumoto98],[Matsumoto00]. Each of them generates a sequence of period length equal to 22203-1.Parameters of the generators provide mutual independence of the corresponding sequences.

Besides these pseudorandom number generators, VSL provides two basic quasi-random numbergenerators:

• Sobol quasi-number generator [Sobol76], [Bratley88], which works in dimensions from 1up to 40.

• Niederreiter quasi-random number generator [Bratley92], which works in dimensions from1 up to 318.

VSL also provides opportunity to register externally defined initialization parameters of thequasi-random number generators and to work in dimensions established by user. See additionaldetails on interface for registration of the parameters in the library in VSL Notes.

Aslo see some testing results for the generators in VSL Notes and comparative performancedata at http://www.intel.com/software/products/mkl/data/vsl/vsl_performance_data.htm.

VSL provides means of registration of such user-designed generators through the steps describedin Advanced Service Subroutines section.

For some basic generators, VSL provides two methods of creating independent random streamsin multiprocessor computations, which are the leapfrog method and the block-splitting method.These sequence splitting methods are also useful in sequential Monte Carlo.

In addition, MT2203 pseudorandom number generator is a set of 1024 generators designed tocreate up to 1024 independent random sequences, which might be used in parallel Monte Carlosimulations. Another generator that has the same feature is Wichmann-Hill. It allows creatingup to 273 independent random streams. The properties of the generators designed for parallelcomputations are discussed in detail in [Coddington94].

You may want to design and use your own basic generators. VSL provides means of registrationof such user-designed generators through the steps described in Advanced ServiceSubroutinessection.

There is also an option to utilize externally generated random numbers in VSL distributiongenerator routines. For this purpose VSL provides three additional basic random numbergenerators:

– for external random data packed in 32-bit integer array

– for external random data stored in double precision floating-point array; data is supposedto be uniformly distributed over (a,b) interval

– for external random data stored in single precision floating-point array; data is supposedto be uniformly distributed over (a,b) interval.

2290


Such basic generators are called the abstract basic random number generators.

See VSL Notes for a more detailed description of the generator properties.

BRNG Parameter Definition

Predefined values for the brng input parameter are as follows:

Table 10-2 Values of brng parameter

Short DescriptionValue

A 31-bit multiplicative congruential generator.VSL_BRNG_MCG31

A generalized feedback shift register generator.VSL_BRNG_R250

A combined multiple recursive generator with twocomponents of order 3.

VSL_BRNG_MRG32K3A

A 59-bit multiplicative congruential generator.VSL_BRNG_MCG59

A set of 273 Wichmann-Hill combined multiplicativecongruential generators.

VSL_BRNG_WH

A Mersenne Twister pseudorandom number generator.VSL_BRNG_MT19937

A set of 1024 Mersenne Twister pseudorandom numbergenerators.

VSL_BRNG_MT2203

A 32-bit Gray code-based generator producing

low-discrepancy sequences for dimensions 1 ≤ s ≤40; user-defined dimensions are also available.

VSL_BRNG_SOBOL

A 32-bit Gray code-based generator producing

low-discrepancy sequences for dimensions 1 ≤ s ≤318; user-defined dimensions are also available.

VSL_BRNG_NIEDERR

An abstract random number generator for integer arrays.VSL_BRNG_IABSTRACT

An abstract random number generator for doubleprecision floating-point arrays.

VSL_BRNG_DABSTRACT

An abstract random number generator for singleprecision floating-point arrays.

VSL_BRNG_SABSTRACT

2291


See VSL Notes for detailed description.

Random Streams

Random stream (or stream) is an abstract source of pseudo- and quasi-random sequences ofuniform distribution. Users have no direct access to these sequences and operate with streamstate descriptors only. A stream state descriptor, which holds state descriptive information fora particular BRNG, is a necessary parameter in each routine of a distribution generator. Onlythe distribution generator routines operate with random streams directly. See VSL Notes fordetails.

NOTE. Random streams associated with abstract basic random number generator arecalled the abstract random streams. See VSL Notes for detailed description of abstractstreams and their use.

User can create unlimited number of random streams by VSL Service Routines like NewStreamand utilize them in any distribution generator to get the sequence of numbers of given probabilitydistribution. When they are no longer needed, the streams should be deleted calling serviceroutine DeleteStream.

VSL provides service functions SaveStreamF and LoadStreamF to save random streamdescriptive data to a binary file and to read this data from a binary file respectively. See VSLNotes for detailed description.

Data Types

FORTRAN:TYPEVSL_STREAM_STATE

INTEGER*4 descriptor1

INTEGER*4 descriptor2

ENDTYPE VSL_STREAM_STATE

C:typedef (void*) VSLStreamStatePtr;

See Advanced Service Routines for the format of the stream state structure for user-designedgenerators.

2292


Error Reporting

VSL routines return status codes of the performed operation to report errors and warnings tothe calling program. Thus, it is up to the application to perform error-related actions and/orrecover from the error. The status codes are of integer type and have the following format:

VSL_ERROR_<ERROR_NAME> - indicates VSL errors

VSL_WARNING_<WARNING_NAME> - indicates VSL warnings.

VSL errors are of negative values while warnings are of positive values. The status code of zerovalue indicates that the operation is completed successfully: VSL_ERROR_OK (or synonymicVSL_STATUS_OK).

Table 10-3 Status Codes and Messages

MessageStatus Code

Indicates no error, execution is successful.VSL_ERROR_OK, VSL_STATUS_OK

Input argument value is not valid.VSL_ERROR_BAD_ARG

Input pointer argument is NULL.VSL_ERROR_NULL_PTR

System cannot allocate memory.VSL_ERROR_MEM_FAILURE

BRNG index is not valid.VSL_ERROR_INVALID_BRNG_INDEX

Two BRNGs are not compatible for the operation.VSL_ERROR_BRNGS_INCOMPATIBLE

BRNG does not support Leapfrog method.VSL_ERROR_LEAPFROG_UNSUPPORTED

BRNG does not support Skip-Ahead method.VSL_ERROR_SKIPAHEAD_UNSUPPORTED

The random stream is invalid.VSL_ERROR_BAD_STREAM

Indicates an error in opening the file.VSL_ERROR_FILE_OPEN

Indicates an error in reading the file.VSL_ERROR_FILE_READ

Indicates an error in writing the file.VSL_ERROR_FILE_WRITE

Indicates an error in closing the file.VSL_ERROR_FILE_CLOSE

2293

MessageStatus Code

File format is unknown.VSL_ERROR_BAD_FILE_FORMAT

File format version is not supported.VSL_ERROR_UNSUPPORTED_FILE_VER

Registration cannot be completed due to lack offree entries in the table of registered BRNGs.

VSL_ERROR_BRNG_TABLE_FULL

The value in StreamStateSize field is bad.VSL_ERROR_BAD_STREAM_STATE_SIZE

The value in WordSize field is bad.VSL_ERROR_BAD_WORD_SIZE

The value in NSeeds field is bad.VSL_ERROR_BAD_NSEEDS

The value in NBits field is bad.VSL_ERROR_BAD_NBITS

Callback function for an abstract BRNG returns aninvalid number of updated entries in a buffer, thatis, < 0 or >nmax.

VSL_ERROR_BAD_UPDATE

Callback function for an abstract BRNG returns zeroas the number of updated entries in a buffer.

VSL_ERROR_NO_NUMBERS

The abstract random stream is invalid.VSL_ERROR_INVALID_ABSTRACT_STREAM

Service Routines

Stream handling comprises routines for creating, deleting, or copying the streams and gettingthe index of a basic generator. A random stream can also be saved to and then read from abinary file. Table 10-4 lists all available service routines

Table 10-4 Service Routines

Short DescriptionRoutine

Creates and initializes a random stream.NewStream

Creates and initializes a random stream for thegenerators with multiple initial conditions.

NewStreamEx

2294

Short DescriptionRoutine

Creates and initializes an abstract random stream forinteger arrays.

iNewAbstractStream

Creates and initializes an abstract random stream fordouble precision floating-point arrays.

dNewAbstractStream

Creates and initializes an abstract random stream forsingle precision floating-point arrays.

sNewAbstractStream

Deletes previously created stream.DeleteStream

Copies a stream to another stream.CopyStream

Creates a copy of a random stream state.CopyStreamState

Writes a stream to a binary file.SaveStreamF

Reads a stream from a binary file.LoadStreamF

Initializes the stream by the leapfrog method to generatea subsequence of the original sequence.

LeapfrogStream

Initializes the stream by the skip-ahead method.SkipAheadStream

Obtains the index of the basic generator responsible forthe generation of a given random stream.

GetStreamStateBrng

Obtains the number of currently registered basicgenerators.

GetNumRegBrngs

NOTE. In the above table, the vsl prefix in the function names is omitted. In the functionreference this prefix is always used in function prototypes and code examples.

Most of the generator-based work comprises three basic steps:

1. Creating and initializing a stream (NewStream, NewStreamEx, CopyStream,CopyStreamState, LeapfrogStream, SkipAheadStream).

2. Generating random numbers with given distribution, see Distribution Generators.

3. Deleting the stream (DeleteStream).

2295


Note that you can concurrently create multiple streams and obtain random data from one orseveral generators by using the stream state. You must use the DeleteStream function todelete all the streams afterwards.

NewStreamCreates and initializes a random stream.

Syntax

Fortran:

status = vslnewstream( stream, brng, seed )

C:

status = vslNewStream( &stream, brng, seed );

Description

For a basic generator with number brng, this function creates a new stream and initializes itwith a 32-bit seed. The seed is an initial value used to select a particular sequence generatedby the basic generator brng. The function is also applicable for generators with multiple initialconditions. See VSL Notes for a more detailed description of stream initialization for differentbasic generators.

NOTE. This function is not applicable for abstract basic random number generators.Please use vsliNewAbstractStream, vslsNewAbstractStream orvsldNewAbstractStream to utilize integer, single-precision or double-precision externalrandom data respectively.

Input Parameters

DescriptionTypeName

Index of the basic generator to initialize the stream.See Table 10-2 for specific value.

FORTRAN: INTEGER,INTENT(IN)

C: int

brng

2296


DescriptionTypeName

Initial condition of the stream. In the case of aquasi-random number generator seed parameter isused to set the dimension. If the dimension is greater


C: unsigned int

seed

than the dimension that brng can support or is lessthan 1, then the dimension is assumed to be equalto 1.

Output Parameters

DescriptionTypeName

Stream state descriptorFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream

C: VSLStreamStatePtr*

Return Values


BRNG index is invalid.VSL_ERROR_INVALID_BRNG_INDEX

System cannot allocate memory for stream.VSL_ERROR_MEM_FAILURE

NewStreamExCreates and initializes a random stream forgenerators with multiple initial conditions.

Syntax

Fortran:

status = vslnewstreamex( stream, brng, n, params )

C:

status = vslNewStreamEx( &stream, brng, n, params );

2297


Description

This function provides an advanced tool to set the initial conditions for a basic generator if itsinput arguments imply several initialization parameters. Initial values are used to select aparticular sequence generated by the basic generator brng. Whenever possible, use NewStream,which is analogous to vslNewStreamEx except that it takes only one 32-bit initial condition.In particular, vslNewStreamEx may be used to initialize the state table in Generalized FeedbackShift Register Generators (GFSRs). A more detailed description of this issue can be found inVSL Notes.

This function is also used to pass user-defined initialization parameters of quasi-random numbergenerators into the library. See VSL Notes for the format for their passing and registration inVSL.

NOTE. This function is not applicable for abstract basic random number generators.Please use vsliNewAbstractStream, vslsNewAbstractStream orvsldNewAbstractStream to utilize integer, single-precision or double-precision externalrandom data respectively.

Input Parameters

DescriptionTypeName

Index of the basic generator to initialize the stream.See Table 10-2 for specific value.


C: int

brng

Number of initial conditions contained in paramsFORTRAN: INTEGER,INTENT(IN)

n

C: unsigned int

Array of initial conditions necessary for the basicgenerator brng to initialize the stream. In the caseof a quasi-random number generator only the first


C: const unsigned int

params

element in params parameter is used to set thedimension. If the dimension is greater than thedimension that brng can support or is less than 1,then the dimension is assumed to be equal to 1.

2298


Output Parameters

DescriptionTypeName

Stream state descriptorFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream


Return Values




iNewAbstractStreamCreates and initializes an abstract random streamfor integer arrays.

Syntax

Fortran:

status = vslinewabstractstream( stream, n, ibuf, icallback )

C:

status = vsliNewAbstractStream( &stream, n, ibuf, icallback );

Description

This function creates a new abstract stream and associates it with an integer array ibuf anduser's callback function icallback that is intended for updating of ibuf content.

Input Parameters

DescriptionTypeName

Size of the array ibufFORTRAN: INTEGER,INTENT(IN)

n

2299


DescriptionTypeName

C: int

Array of n 32-bit integersFORTRAN: INTEGER,INTENT(IN)

ibuf

C: unsigned int*

FORTRAN: Address of the callback function used foribuf update

FORTRAN: See Note below

C: See Note below

icallback

C: Pointer to the callback function used for ibufupdate

Output Parameters

DescriptionTypeName

Descriptor of the stream state structureFORTRAN:TYPE(VSL_STREAM_STATE),TINTENT(OUT)

stream


NOTE. Format of the callback function in Fortran:

INTEGER FUNCTION IUPDATEFUNC[C]( stream, n, ibuf, nmin, nmax, idx )

TYPE(VSL_STREAM_STATE),POINTER :: stream[reference]

INTEGER(KIND=4),INTENT(IN) :: n[reference]

INTEGER(KIND=4),INTENT(OUT) :: ibuf[reference](0:n-1)

INTEGER(KIND=4),INTENT(IN) :: nmin[reference]

INTEGER(KIND=4),INTENT(IN) :: nmax[reference]

INTEGER(KIND=4),INTENT(IN) :: idx[reference]

Format of the callback function in C:

int iUpdateFunc( VSLStreamStatePtrstream, int* n, unsigned int ibuf[],int* nmin, int* nmax, int* idx );

2300


The callback function returns the number of elements in the array actually updated by thefunction. Table 10-5 gives the description of the callback function parameters.

Table 10-5 icallback Callback Function Parameters

Short DescriptionParameters

Abstract stream descriptorstream

Size of ibufn

Array of random numbers associated with the streamstream

ibuf

Minimal quantity of numbers to updatenmin

Maximal quantity of numbers that can be updatednmax

Position in cyclic buffer ibuf to start update 0 ≤ idx< n.

idx

Return Values


Parameter n is not positive.VSL_ERROR_BAD_ARG


Either buffer or callback function parameter is a NULLpointer.

VSL_ERROR_NULL_PTR

dNewAbstractStreamCreates and initializes an abstract random streamfor double precision floating-point arrays.

Syntax

Fortran:

status = vsldnewabstractstream( stream, n, dbuf, a, b, dcallback )

2301

C:

status = vsldNewAbstractStream( &stream, n, dbuf, a, b, dcallback );

Description

This function creates a new abstract stream for double precision floating-point arrays withrandom numbers of the uniform distribution over interval (a,b). The function associates thestream with a double precision array dbuf and user's callback function dcallback that isintended for updating of dbuf content.

Input Parameters

DescriptionTypeName

Size of the array dbufFORTRAN: INTEGER,INTENT(IN)

n

C: int

Array of n double precision floating-point randomnumbers with uniform distribution over interval (a,b)

FORTRAN: DOUBLEPRECISION, INTENT(IN)

C: double*

dbuf

Left boundary aFORTRAN: DOUBLEPRECISION, INTENT(IN)

a

C: double

Right boundary bFORTRAN: DOUBLEPRECISION, INTENT(IN)

b

C: double

FORTRAN: Address of the callback function used forupdate of the array dbuf


C: See Note below

dcallback

C: Pointer to the callback function used for update ofthe array dbuf

2302


Output Parameters

DescriptionTypeName

Descriptor of the stream state structureFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream



INTEGER FUNCTION DUPDATEFUNC[C]( stream, n, dbuf, nmin, nmax, idx )



REAL(KIND=8), INTENT(OUT) :: dbuf[reference](0:n-1)





int dUpdateFunc( VSLStreamStatePtr stream, int* n, double dbuf[], int*nmin, int* nmax, int* idx );


dcallback Callback Function Parameters



Size of dbufn


dbuf

2303


Position in cyclic buffer dbuf to start update 0 ≤ idx< n.

idx

Return Values





VSL_ERROR_NULL_PTR

sNewAbstractStreamCreates and initializes an abstract random streamfor single precision floating-point arrays.

Syntax

Fortran:

status = vslsnewabstractstream( stream, n, sbuf, a, b, scallback )

C:

status = vslsNewAbstractStream( &stream, n, sbuf, a, b, scallback );

Description

This function creates a new abstract stream for single precision floating-point arrays withrandom numbers of the uniform distribution over interval (a,b). The function associates thestream with a single precision array sbuf and user’s callback function scallback that is intendedfor updating of sbuf content.

2304

Input Parameters

DescriptionTypeName

Size of the array sbufFORTRAN: INTEGER,INTENT(IN)

n

C: int

Array of n single precision floating-point randomnumbers with uniform distribution over interval (a,b)

FORTRAN: REAL,INTENT(IN)

C: float*

sbuf

Left boundary aFORTRAN: REAL,INTENT(IN)

a

C: float

Right boundary bFORTRAN: REAL,INTENT(IN)

b

C: float

FORTRAN: Address of the callback function used forupdate of the array sbuf


C: See Note below

scallback

C: Pointer to the callback function used for updateof the array sbuf

Output Parameters

DescriptionTypeName

Descriptor of the stream state structureFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream


2305


INTEGER FUNCTION SUPDATEFUNC[C]( stream, n, sbuf, nmin, nmax, idx )



REAL(KIND=4), INTENT(OUT) :: sbuf[reference](0:n-1)





int sUpdateFunc( VSLStreamStatePtr stream, int* n, float sbuf[], int*nmin, int* nmax, int* idx );


Table 10-7 scallback Callback Function Parameters



Size of sbufn


sbuf



Position in cyclic buffer sbuf to start update 0 ≤ idx< n.

idx

Return Values


2306




VSL_ERROR_NULL_PTR

DeleteStreamDeletes a random stream.

Syntax

Fortran:

status = vsldeletestream( stream )

C:

status = vslDeleteStream( &stream );

Description

This function deletes the random stream created by one of the initialization functions.

Input/Output Parameters

DescriptionTypeName

FORTRAN: Stream state descriptor. Must havenon-zero value. After the stream is successfullydeleted, the descriptor becomes invalid.

FORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream

C: Stream state descriptor. Must have non-zero value.After the stream is successfully deleted, the pointeris set to NULL.


Return Values


srcstream parameter is a NULL pointer.VSL_ERROR_NULL_PTR

srcstream is not a valid random stream.VSL_ERROR_BAD_STREAM

System cannot allocate memory for newstream.VSL_ERROR_MEM_FAILURE

2307


CopyStreamCreates a copy of a random stream.

Syntax

Fortran:

status = vslcopystream( newstream, srcstream )

C:

status = vslCopyStream( &newstream, srcstream );

Description

The function creates an exact copy of srcstream and stores its descriptor to newstream.

Input Parameters

DescriptionTypeName

FORTRAN: Descriptor of the stream to be copiedFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(IN)

srcstream

C: Pointer to the stream state structure to be copied

C: VSLStreamStatePtr

Output Parameters

DescriptionTypeName

Copied stream descriptorFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

newstream


Return Values


srcstream parameter is a NULL pointer.VSL_ERROR_NULL_PTR

srcstream is not a valid random stream.VSL_ERROR_BAD_STREAM

2308


System cannot allocate memory for newstream.VSL_ERROR_MEM_FAILURE

CopyStreamStateCreates a copy of a random stream state.

Syntax

Fortran:

status = vslcopystreamstate( deststream, srcstream )

C:

status = vslCopyStreamState( deststream, srcstream );

Description

The function copies a stream state from srcstream to the existing deststream stream. Boththe streams should be generated by the same basic generator. En error message is generatedwhen the index of the BRNG that produced deststream stream differs from the index of theBRNG that generated srcstream stream.

Unlike CopyStream function, which creates a new stream and copies both the stream stateand other data from srcstream, the function CopyStreamState copies only srcstream streamstate data to the generated deststream stream.

Input Parameters

DescriptionTypeName

FORTRAN: Descriptor of the destination streamwhere the state of scrstream stream is copied

FORTRAN:TYPE(VSL_STREAM_STATE),INTENT(IN)

srcstream

C: Pointer to the stream state structure, from whichthe state structure is copiedC: VSLStreamStatePtr

2309


Output Parameters

DescriptionTypeName

FORTRAN: Descriptor of the stream with the stateto be copied

FORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

deststream

C: Pointer to the stream state structure where thestream state is copiedC: VSLStreamStatePtr

Return Values


Either srcstream or deststream is a NULLpointer.

VSL_ERROR_NULL_PTR

Either srcstream or deststream is not a validrandom stream.

VSL_ERROR_BAD_STREAM

BRNG associated with srcstream is not compatiblewith BRNG associated with deststream.

VSL_ERROR_BRNGS_INCOMPATIBLE

SaveStreamFWrites random stream descriptive data to binaryfile.

Syntax

Fortran:

errstatus = vslsavestreamf( stream, fname )

C:

errstatus = vslSaveStreamF( stream, fname );

Description

This function writes the random stream descriptive data to the binary file. Random streamdescriptive data is saved to the binary file with the name fname. Random stream stream mustbe a valid stream created by NewStream-like or CopyStream-like service routines. If the streamcannot be saved to the file, errstatus has a non-zero value. Random stream can be read fromthe binary file using LoadStreamF function.

2310


Input Parameters

DescriptionTypeName

Random stream to be written to the fileFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(IN)

stream


FORTRAN: File name specified as a C-stylenull-terminated string

FORTRAN: CHARACTER(*),INTENT(IN)

fname

C: File name specified as a Fortran-style characterstring

C: char*

Output Parameters

DescriptionTypeName

Error status of the operationFORTRAN: INTEGER

C: int

errstatus

Return Values


Either fname or stream is a NULL pointer.VSL_ERROR_NULL_PTR

stream is not a valid random stream.VSL_ERROR_BAD_STREAM




System cannot allocate memory for internal needs.VSL_ERROR_MEM_FAILURE

2311


LoadStreamFCreates new stream and reads stream descriptivedata from binary file.

Syntax

Fortran:

errstatus = vslloadstreamf( stream, fname )

C:

errstatus = vslLoadStreamF( &stream, fname );

Description

This function creates a new stream and reads stream descriptive data from the binary file. Anew random stream is created using the stream descriptive data from the binary file with thename fname. If the stream cannot be read (for example, I/O error occurs or the file format isinvalid), errstatus has a non-zero value. To save random stream to the file, use SaveStreamFfunction.

Input Parameters

DescriptionTypeName

FORTRAN: File name specified as a C-stylenull-terminated string

FORTRAN: CHARACTER(*),INTENT(IN)

fname

C: File name specified as a Fortran-style characterstring

C: char*

Output Parameters

DescriptionTypeName

FORTRAN: Descriptor of a new random streamFORTRAN:TYPE(VSL_STREAM_STATE),INTENT(OUT)

stream

C: Pointer to a new random stream


2312


DescriptionTypeName

Error status of the operationFORTRAN: INTEGER

C: int

errstatus

Return Values


fname is a NULL pointer.VSL_ERROR_NULL_PTR




System cannot allocate memory for internal needs.VSL_ERROR_MEM_FAILURE

Unknown file format.VSL_ERROR_BAD_FILE_FORMAT

File format version is unsupported.VSL_ERROR_UNSUPPORTED_FILE_VER

LeapfrogStreamInitializes a stream using the leapfrog method.

Syntax

Fortran:

status = vslleapfrogstream( stream, k, nstreams )

C:

status = vslLeapfrogStream( stream, k, nstreams );

Description

The function allows generating random numbers in a random stream with non-unit stride. Thisfeature is particularly useful in distributing random numbers from original stream acrossnstreams buffers without generating the original random sequence with subsequent manualdistribution. One of the important applications of the leapfrog method is splitting the originalsequence into non-overlapping subsequences across nstreams computational nodes. The

2313


function initializes the original random stream (see Figure 10-1 ) to generate random numbers

for the computational node k, 0 ≤ k < nstreams, where nstreams is the largest number ofcomputational nodes used.

Figure 10-1 Leapfrog Method

The leapfrog method is supported only for those basic generators that allow splitting elementsby the leapfrog method, which is more efficient than simply generating them by a generatorwith subsequent manual distribution across computational nodes. See VSL Notes for details.

For quasi-random basic generators the leapfrog method allows generating individual componentsof quasi-random vectors instead of whole quasi-random vectors. In this case nstreamsparameter should be equal to the dimension of the quasi-random vector while k parameter

should be the index of a component to be generated (0 ≤ k < nstreams). Other parametersvalues are not allowed.

The following code examples illustrate the initialization of three independent streams using theleapfrog method:

2314

Example 10-1 FORTRAN Code for Leapfrog Method...

TYPE(VSL_STREAM_STATE) ::stream1



! Creating 3 identical streams

status = vslnewstream(stream1, VSL_BRNG_MCG31, 174)

status = vslcopystream(stream2, stream1)

status = vslcopystream(stream3, stream1)

! Leapfrogging the streams

status = vslleapfrogstream(stream1, 0, 3)



! Generating random numbers

...

! Deleting the streams

status = vsldeletestream(stream1)



...

2315


Example 10-2 C Code for Leapfrog Method...

VSLStreamStatePtr stream1;



/* Creating 3 identical streams */

status = vslNewStream(&stream1, VSL_BRNG_MCG31, 174);

status = vslCopyStream(&stream2, stream1);


/* Leapfrogging the streams

*/

status = vslLeapfrogStream(stream1, 0, 3);



/* Generating random numbers

*/

...

/* Deleting the streams

*/

status = vslDeleteStream(&stream1);



...

Input Parameters

DescriptionTypeName

FORTRAN: Descriptor of the stream to which leapfrogmethod is applied


stream

2316


DescriptionTypeName

C: Pointer to the stream state structure to whichleapfrog method is applied


Index of the computational node, or stream numberFORTRAN: INTEGER,INTENT(IN)

k

C: int

Largest number of computational nodes, or strideFORTRAN: INTEGER,INTENT(IN)

nstreams

C: int

Return Values


stream is a NULL pointer.VSL_ERROR_NULL_PTR


BRNG does not support Leapfrog method.VSL_ERROR_LEAPFROG_UNSUPPORTED

SkipAheadStreamInitializes a stream using the block-splittingmethod.

Syntax

Fortran:

status = vslskipaheadstream( stream, nskip )

C:

status = vslSkipAheadStream( stream, nskip);

Description

This function skips a given number of elements in a random stream. This feature is particularlyuseful in distributing random numbers from original random stream across differentcomputational nodes. If the largest number of random numbers used by a computational nodeis nskip, then the original random sequence may be split by SkipAheadStream into

2317


non-overlapping blocks of nskip size so that each block corresponds to the respectivecomputational node. The number of computational nodes is unlimited. This method is knownas the block-splitting method or as the skip-ahead method. (see Figure 10-2 ).

Figure 10-2 Block-Splitting Method

The skip-ahead method is supported only for those basic generators that allow skipping elementsby the skip-ahead method, which is more efficient than simply generating them by generatorwith subsequent manual skipping. See VSL Notes for details.

Please note that for quasi-random basic generators the skip-ahead method works withcomponents of quasi-random vectors rather than with whole quasi-random vectors. Thus toskip NS quasi-random vectors, set nskip parameter equal to the NS*DIMEN, where DIMEN isthe dimension of quasi-random vector.

The following code examples illustrate how to initialize three independent streams usingSkipAheadStream function:

2318


Example 10-3 FORTRAN Code for Block-Splitting Method...

type(VSL_STREAM_STATE) ::stream1



! Creating the 1st stream

status = vslnewstream(stream1, VSL_BRNG_MCG31, 174)

! Skipping ahead by 7 elements the 2nd stream

status = vslcopystream(stream2, stream1);

status = vslskipaheadstream(stream2, 7);

! Skipping ahead by 7 elements the 3rd stream

status = vslcopystream(stream3, stream2);

status = vslskipaheadstream(stream3, 7);

! Generating random numbers

...

! Deleting the streams




...

2319


Example 10-4 C Code for Block-Splitting MethodVSLStreamStatePtr stream1;



/* Creating the 1st stream

*/

status = vslNewStream(&stream1, VSL_BRNG_MCG31, 174);

/* Skipping ahead by 7 elements the 2nd stream */


status = vslSkipAheadStream(stream2, 7);

/* Skipping ahead by 7 elements the 3rd stream */


status = vslSkipAheadStream(stream3, 7);

/* Generating random numbers

*/

...

/* Deleting the streams

*/




...

Input Parameters

DescriptionTypeName

FORTRAN: Descriptor of the stream to whichblock-splitting method is applied


stream

C: Pointer to the stream state structure to whichblock-splitting method is appliedC: VSLStreamStatePtr

2320


DescriptionTypeName

Number of skipped elementsFORTRAN: INTEGER,INTENT(IN)

nskip

C: int

Return Values




BRNG does not support Skip-Ahead method.VSL_ERROR_SKIPAHEAD_UNSUPPORTED

GetStreamStateBrngReturns index of a basic generator used forgeneration of a given random stream.

Syntax

Fortran:

brng = vslgetstreamstatebrng( stream )

C:

brng = vslGetStreamStateBrng( stream );

Description

This function retrieves the index of a basic generator used for generation of a given randomstream.

Input Parameters

DescriptionTypeName

FORTRAN: Descriptor of the stream stateFORTRAN:TYPE(VSL_STREAM_STATE),TINTENT(IN)

stream

C: Pointer to the stream state structure

2321


DescriptionTypeName


Output Parameters

DescriptionTypeName

Index of the basic generator assigned for thegeneration of stream ; negative in case of an error

FORTRAN: INTEGER

C: int

brng

Return Values



GetNumRegBrngsObtains the number of currently registered basicgenerators.

Syntax

Fortran:

nregbrngs = vslgetnumregbrngs( )

C:

nregbrngs = vslGetNumRegBrngs( void );

Description

This function obtains the number of currently registered basic generators. Whenever userregisters a user-designed basic generator, the number of registered basic generators isincremented. The maximum number of basic generators that can be registered is determinedby VSL_MAX_REG_BRNGS parameter.

2322


Output Parameters

DescriptionTypeName

Number of basic generators registered at the momentof the function call

FORTRAN: INTEGER

C: int

nregbrngs

Distribution Generators

VSL routines are used to generate random numbers with different types of distribution. Eachfunction group is introduced below by the type of underlying distribution and contains a shortdescription of its functionality, as well as specifications of the call sequence for both FORTRANand C-interface and the explanation of input and output parameters. Table 10-8 and Table10-9 list the random number generator routines, together with used data types, outputdistributions, and sets correspondence between data types of the generator routines and calledbasic random number generators.

Table 10-8 Continuous Distribution Generators

DescriptionBRNGData Type

DataTypes

Type ofDistribution

Uniform continuous distribution on the interval[a,b].

s, ds, dUniform

Normal (Gaussian) distribution.s, ds, dGaussian

Multivariate normal (Gaussian) distribution.s, ds, dGaussianMV

Exponential distribution.s, ds, dExponential

Laplace distribution (double exponentialdistribution).

s, ds, dLaplace

Weibull distribution.s, ds, dWeibull

Cauchy distribution.s, ds, dCauchy

Rayleigh distribution.s, ds, dRayleigh

Lognormal distribution.s, ds, dLognormal

2323


DescriptionBRNGData Type

DataTypes

Type ofDistribution

Gumbel (extreme value) distribution.s, ds, dGumbel

Gamma distribution.s, ds, dGamma

Beta distribution.s, ds, dBeta

Table 10-9 Discrete Distribution Generators

DescriptionBRNG Data TypeDataTypes

Type ofDistribution

Uniform discretedistribution on theinterval [a,b).

diUniform

Generator of integerrandom values withuniform bitdistribution.

iiUniformBits

Bernoulli distribution.siBernoulli

Geometric distribution.siGeometric

Binomial distribution.diBinomial

Hypergeometricdistribution.

diHypergeometric

Poisson distribution.s (forVSL_METHOD_IPOISSON_POISNORM)

iPoisson

s (for distribution parameter λ≥ 27) and

d (for λ < 27) (forVSL_METHOD_IPOISSON_PTPE)

Poisson distributionwith varying mean.

siPoissonV

2324

DescriptionBRNG Data TypeDataTypes

Type ofDistribution

Negative binomialdistribution, or Pascaldistribution.

diNegBinomial

The library provides two modes of random number generation, accurate and fast. Accurategeneration mode is intended for the applications that are highly demanding to accuracy ofcalculations. When used in this mode, the generators produce random numbers lying completelywithin definitional domain for all values of the distribution parameters. For example, randomnumbers obtained from the generator of continuous distribution that is uniform on interval[a,b] belong to this interval irrespective of what a and b values may be. Fast mode provideshigh performance of generation and also guaranties that generated random numbers belongto the definitional domain except for some specific values of distribution parameters. Thegeneration mode is set by specifying relevant value of the method parameter in generatorroutines. List of distributions that support accurate mode of generation is given in the tablebelow.

Distribution Generators Supporting Accurate Mode

Data TypesType of Distribution

s, dUniform

s, dExponential

s, dWeibull

s, dRayleigh

s, dLognormal

s, dGamma

s, dBeta

See additional details about accurate and fast mode of random number generation in VSL Notes.

Continuous Distributions

This section describes routines for generating random numbers with continuous distribution.

2325


UniformGenerates random numbers with uniformdistribution.

Syntax

Fortran:

status = vsrnguniform( method, stream, n, r, a, b )

status = vdrnguniform( method, stream, n, r, a, b )

C:

status = vsRngUniform( method, stream, n, r, a, b );

status = vdRngUniform( method, stream, n, r, a, b );

Description

This function generates random numbers uniformly distributed over the interval [a, b], where

a, b are the left and right bounds of the interval, respectively, and a, b∈R ; a > b.

The probability density function is given by:

The cumulative distribution function is as follows:

2326


Input Parameters

DescriptionTypeName

Generation method; the specific values are as follows:

VSL_METHOD_SUNIFORM_STD

VSL_METHOD_DUNIFORM_STD

VSL_METHOD_SUNIFORM_STD_ACCURATE

VSL_METHOD_DUNIFORM_STD_ACCURATE


C: int

method

Standard method.

FORTRAN: Descriptor of the stream state structure.FORTRAN: TYPE(VSL_STREAM_STATE),INTENT(IN)

stream



Number of random values to be generatedFORTRAN: INTEGER,INTENT(IN)

n

C: int

Left bound aFORTRAN: REAL,INTENT(IN) forvsrnguniform

a

DOUBLE PRECISION,INTENT(IN) forvdrnguniform

C: float for vsRngUniform

double for vdRngUniform

Right bound bFORTRAN: REAL,INTENT(IN) forvsrnguniform

b

2327


DescriptionTypeName

DOUBLE PRECISION,INTENT(IN) forvdrnguniform

C: float for vsRngUniform

double for vdRngUniform

Output Parameters

DescriptionTypeName

Vector of n random numbers uniformly distributedover the interval [a,b]

FORTRAN: REAL,INTENT(OUT) forvsrnguniform

r

DOUBLE PRECISION,INTENT(OUT) forvdrnguniform

C: float* for vsRngUniform

double* for vdRngUniform

Return Values




Callback function for an abstract BRNG returns aninvalid number of updated entries in a buffer, thatis, < 0 or > nmax.


Callback function for an abstract BRNG returns 0 asthe number of updated entries in a buffer.


2328

GaussianGenerates normally distributed random numbers.

Syntax

Fortran:

status = vsrnggaussian( method, stream, n, r, a, sigma )

status = vdrnggaussian( method, stream, n, r, a, sigma )

C:

status = vsRngGaussian( method, stream, n, r, a, sigma );

status = vdRngGaussian( method, stream, n, r, a, sigma );

Description

This function generates random numbers with normal (Gaussian) distribution with mean value

a and standard deviation σ, where

a, σ∈R ; σ > 0.



2329


The cumulative distribution function Fa,σ(x) can be expressed in terms of standard normal

distribution Φ(x) as

Fa,σ(x) = Φ((x - a)/σ)

Input Parameters

DescriptionTypeName

Generation method. The specific values are as follows:

VSL_METHOD_SGAUSSIAN_BOXMULLER

VSL_METHOD_SGAUSSIAN_BOXMULLER2

VSL_METHOD_SGAUSSIAN_ICDF

VSL_METHOD_DGAUSSIAN_BOXMULLER

VSL_METHOD_DGAUSSIAN_BOXMULLER2

VSL_METHOD_DGAUSSIAN_ICDF


C: int

method

See brief description of the methods BOXMULLER,BOXMULLER2, and ICDF in Table 10-1


stream




n

C: int

Mean value aFORTRAN: REAL,INTENT(IN) forvsrnggaussian

a

DOUBLE PRECISION,INTENT(IN) forvdrnggaussian

C: float for vsRngGaussian

2330


DescriptionTypeName

double for vdRngGaussian

Standard deviation σFORTRAN: REAL,INTENT(IN) forvsrnggaussian

sigma

DOUBLE PRECISION,INTENT(IN) forvdrnggaussian

C: float for vsRngGaussian

double for vdRngGaussian

Output Parameters

DescriptionTypeName

Vector of n normally distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrnggaussian

r

DOUBLE PRECISION,INTENT(OUT) forvdrnggaussian

C: float* forvsRngGaussian

double* for vdRngGaussian

Return Values






2331




GaussianMVGenerates random numbers from multivariatenormal distribution.

Syntax

Fortran:

status = vsrnggaussianmv( method, stream, n, r, dimen, mstorage, a, t )

status = vdrnggaussianmv( method, stream, n, r, dimen, mstorage, a, t )

C:

status = vsRngGaussianMV( method, stream, n, r, dimen, mstorage, a, t );

status = vdRngGaussianMV( method, stream, n, r, dimen, mstorage, a, t );

Description

This function generates random numbers with d-variate normal (Gaussian) distribution with

mean value a and variance-covariance matrix C, where a∈Rd; C is a d×d symmetricpositive-definite matrix.


where x∈Rd .

Matrix C can be represented as C = TTT, where T is a lower triangular matrix - Cholesky factorof C.

2332


Instead of variance-covariance matrix C the generation routines require Cholesky factor of Cin input. To compute Cholesky factor of matrix C, the user may call MKL LAPACK routines formatrix factorization: ?potrf or ?pptrf for v?RngGaussianMV/v?rnggaussianmv routines (?means either s or d for single and double precision respectively). See Application Notes formore details.

Input Parameters

DescriptionTypeName


VSL_METHOD_SGAUSSIANMV_BOXMULLER

VSL_METHOD_SGAUSSIANMV_BOXMULLER2

VSL_METHOD_SGAUSSIANMV_ICDF

VSL_METHOD_DGAUSSIANMV_BOXMULLER

VSL_METHOD_DGAUSSIANMV_BOXMULLER2

VSL_METHOD_DGAUSSIANMV_ICDF


C: int

method

See brief description of the methods BOXMULLER,BOXMULLER2, and ICDF in Table 10-1


stream




n

C: int

Dimension d ( d ≥ 1) of output random vectorsFORTRAN: INTEGER,INTENT(IN)

dimen

C: int

FORTRAN: Matrix storage scheme for uppertriangular matrix TT. The routine supports threematrix storage schemes:


C: int

mstorage

2333


DescriptionTypeName

• VSL_MATRIX_STORAGE_FULL— all d x d elementsof the matrix TT are passed, however, only theupper triangle part is actually used in the routine.

• VSL_MATRIX_STORAGE_PACKED— upper triangleelements of TT are packed by rows into aone-dimensional array.

• VSL_MATRIX_STORAGE_DIAGONAL— only diagonalelements of TT are passed.

C: Matrix storage scheme for lower triangular matrixT. The routine supports three matrix storage schemes:

• VSL_MATRIX_STORAGE_FULL— all d x d elementsof the matrix T are passed, however, only thelower triangle part is actually used in the routine.

• VSL_MATRIX_STORAGE_PACKED— lower triangleelements of T are packed by rows into aone-dimensional array.

• VSL_MATRIX_STORAGE_DIAGONAL— only diagonalelements of T are passed.

Mean vector a of dimension dFORTRAN: REAL,INTENT(IN) forvsrnggaussianmv

a

DOUBLE PRECISION,INTENT(IN) forvdrnggaussianmv

C: float* forvsRngGaussianMV

double* forvdRngGaussianMV

2334


DescriptionTypeName

FORTRAN: Elements of the upper triangular matrixpassed according to the matrix TT storage schememstorage.

FORTRAN: REAL,INTENT(IN) forvsrnggaussianmv

t

DOUBLE PRECISION,INTENT(IN) forvdrnggaussianmv

C: Elements of the lower triangular matrix passedaccording to the matrix T storage scheme mstorage.



Output Parameters

DescriptionTypeName

Array of n random vectors of dimension dimenFORTRAN: REAL,INTENT(OUT) forvsrnggaussianmv

r

DOUBLE PRECISION,INTENT(OUT) forvdrnggaussianmv



Application Notes

Since matrices are stored in Fortran by columns, while in C they are stored by rows, the usageof MKL factorization routines (assuming Fortran matrices storage) in combination withmultivariate normal RNG (assuming C matrix storage) is slightly different in C and Fortran. Thefollowing tables help in using these routines in C and Fortran. For further information pleaserefer to the appropriate VSL example file.

2335


Using Cholesky Factorization Routines in Fortran

Result ofFactorizationas InputArgumentfor RNG

UPLOParameterinFactorizationRoutine

FactorizationRoutine

Variance-CovarianceMatrix Argument

Matrix Storage Scheme

Uppertriangle ofTT. Lowertriangle isnot used.

‘U’spotrf forvsrnggaussianmv

dpotrf forvdrnggaussianmv

C in Fortrantwo-dimensionalarray

VSL_MATRIX_STORAGE_FULL

Uppertriangle of TT

packed by

‘L’spptrf forvsrnggaussianmv

dpptrf forvdrnggaussianmv

Lower triangle of Cpacked by columnsintoone-dimensionalarray

VSL_MATRIX_STORAGE_PACKED

rows intoone-dimensionalarray.

Using Cholesky Factorization Routines in C






Uppertriangle ofTT. Lowertriangle isnot used.

‘U’spotrf forvsRngGaussianMV

dpotrf forvdRngGaussianMV

C in Ctwo-dimensionalarray

VSL_MATRIX_STORAGE_FULL

Uppertriangle of TT

packed by

‘L’spptrf forvsRngGaussianMV

dpptrf forvdRngGaussianMV

Lower triangle of Cpacked by columnsintoone-dimensionalarray

VSL_MATRIX_STORAGE_PACKED

2336







rows intoone-dimensionalarray.

Return Values








ExponentialGenerates exponentially distributed randomnumbers.

Syntax

Fortran:

status = vsrngexponential( method, stream, n, r, a, beta )

status = vdrngexponential( method, stream, n, r, a, beta )

C:

status = vsRngExponential( method, stream, n, r, a, beta );

status = vdRngExponential( method, stream, n, r, a, beta );

2337


Description

This function generates random numbers with exponential distribution that has displacement

a and scalefactor β, where a, β∈R ; β > 0.



Input Parameters

DescriptionTypeName


VSL_METHOD_SEXPONENTIAL_ICDF

VSL_METHOD_DEXPONENTIAL_ICDF

VSL_METHOD_SEXPONENTIAL_ICDF_ACCURATE

VSL_METHOD_DEXPONENTIAL_ICDF_ACCURATE


C: int

method

Inverse cumulative distribution function method


stream



2338


DescriptionTypeName


n

C: int

Displacement aFORTRAN: REAL,INTENT(IN) forvsrngexponential

a

DOUBLE PRECISION,INTENT(IN) forvdrngexponential

C: float forvsRngExponential

C: double forvdRngExponential

Scalefactor βFORTRAN: REAL,INTENT(IN) forvsrngexponential

beta

DOUBLE PRECISION,INTENT(IN) forvdrngexponential

C: float forvsRngExponential

double forvdRngExponential

Output Parameters

DescriptionTypeName

Vector of n exponentially distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrngexponential

r

2339


DescriptionTypeName

DOUBLE PRECISION,INTENT(OUT) forvdrngexponential

C: float* forvsRngExponential

double* forvdRngExponential

Return Values








LaplaceGenerates random numbers with Laplacedistribution.

Syntax

Fortran:

status = vsrnglaplace( method, stream, n, r, a, beta )

status = vdrnglaplace( method, stream, n, r, a, beta )

C:

status = vsRngLaplace( method, stream, n, r, a, beta );

status = vdRngLaplace( method, stream, n, r, a, beta );

2340


Description

This function generates random numbers with Laplace distribution with mean value (or average)

a and scalefactor β, where a, β∈R ; β > 0. The scalefactor value determines the standarddeviation as



Input Parameters

DescriptionTypeName


VSL_METHOD_SLAPLACE_ICDF

VSL_METHOD_DLAPLACE_ICDF


C: int

method


2341


DescriptionTypeName


stream




n

C: int

Mean value aFORTRAN: REAL,INTENT(IN) forvsrnglaplace

a

DOUBLE PRECISION,INTENT(IN) forvdrnglaplace

C: float for vsRngLaplace

double for vdRngLaplace

Scalefactor βFORTRAN: REAL,INTENT(IN) forvsrnglaplace

beta

DOUBLE PRECISION,INTENT(IN) forvdrnglaplace

C: float for vsRngLaplace

double for vdRngLaplace

Output Parameters

DescriptionTypeName

Vector of n Laplace distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrnglaplace

r

2342


DescriptionTypeName

DOUBLE PRECISION,INTENT(OUT) forvdrnglaplace

C: float* for vsRngLaplace

double* for vdRngLaplace

Return Values








WeibullGenerates Weibull distributed random numbers.

Syntax

Fortran:

status = vsrngweibull( method, stream, n, r, alpha, a, beta )

status = vdrngweibull( method, stream, n, r, alpha, a, beta )

C:

status = vsRngWeibull( method, stream, n, r, alpha, a, beta );

status = vdRngWeibull( method, stream, n, r, alpha, a, beta );

Description

This function generates Weibull distributed random numbers with displacement a, scalefactor

β, and shape α, where α, β, a∈R ; α > 0 , β > 0.

2343




Input Parameters

DescriptionTypeName


VSL_METHOD_SWEIBULL_ICDF

VSL_METHOD_DWEIBULL_ICDF

VSL_METHOD_SWEIBULL_ICDF_ACCURATE

VSL_METHOD_DWEIBULL_ICDF_ACCURATE


C: int

method



stream



2344


DescriptionTypeName


n

C: int

Shape αFORTRAN: REAL,INTENT(IN) forvsrngweibull

alpha

DOUBLE PRECISION,INTENT(IN) forvdrngweibull

C: float for vsRngWeibull

double for vdRngWeibull

Displacement aFORTRAN: REAL,INTENT(IN) forvsrngweibull

a




Scalefactor βFORTRAN: REAL,INTENT(IN) forvsrngweibull

beta




2345


Output Parameters

DescriptionTypeName

Vector of n Weibull distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrngweibull

r

DOUBLE PRECISION,INTENT(OUT) forvdrngweibull

C: float* for vsRngWeibull

double* for vdRngWeibull

Return Values








CauchyGenerates Cauchy distributed random values.

Syntax

Fortran:

status = vsrngcauchy( method, stream, n, r, a, beta )

status = vdrngcauchy( method, stream, n, r, a, beta )

2346


C:

status = vsRngCauchy( method, stream, n, r, a, beta );

status = vdRngCauchy( method, stream, n, r, a, beta );

Description

This function generates Cauchy distributed random numbers with displacement a and scalefactor

β, where a, β∈R ; β > 0.



Input Parameters

DescriptionTypeName


VSL_METHOD_SCAUCHY_ICDF

VSL_METHOD_DCAUCHY_ICDF


C: int

method


2347


DescriptionTypeName


stream




n

C: int

Displacement aFORTRAN: REAL,INTENT(IN) for vsrngcauchy

a

DOUBLE PRECISION,INTENT(IN) for vdrngcauchy

C: float for vsRngCauchy

double for vdRngCauchy

Scalefactor βFORTRAN: REAL,INTENT(IN) for vsrngcauchy

beta

DOUBLE PRECISION,INTENT(IN) for vdrngcauchy

C: float for vsRngCauchy

double for vdRngCauchy

Output Parameters

DescriptionTypeName

Vector of n Cauchy distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrngcauchy

r

DOUBLE PRECISION,INTENT(OUT) forvdrngcauchy

2348


DescriptionTypeName

C: float* for vsRngCauchy

double* for vdRngCauchy

Return Values








RayleighGenerates Rayleigh distributed random values.

Syntax

Fortran:

status = vsrngrayleigh( method, stream, n, r, a, beta )

status = vdrngrayleigh( method, stream, n, r, a, beta )

C:

status = vsRngRayleigh( method, stream, n, r, a, beta );

status = vdRngRayleigh( method, stream, n, r, a, beta );

Description

This function generates Rayleigh distributed random numbers with displacement a and scalefactor


Rayleigh distribution is a special case of Weibull distribution, where the shape parameter α= 2.

2349




Input Parameters

DescriptionTypeName


VSL_METHOD_SRAYLEIGH_ICDF

VSL_METHOD_DRAYLEIGH_ICDF

VSL_METHOD_SRAYLEIGH_ICDF_ACCURATE

VSL_METHOD_DRAYLEIGH_ICDF_ACCURATE


C: int

method



stream



2350


DescriptionTypeName


n

C: int

Displacement aFORTRAN: REAL,INTENT(IN) forvsrngrayleigh

a

DOUBLE PRECISION,INTENT(IN) forvdrngrayleigh

C: float for vsRngRayleigh

double for vdRngRayleigh

Scalefactor βFORTRAN: REAL,INTENT(IN) forvsrngrayleigh

beta

DOUBLE PRECISION,INTENT(IN) forvdrngrayleigh

C: float for vsRngRayleigh

double for vdRngRayleigh

Output Parameters

DescriptionTypeName

Vector of n Rayleigh distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrngrayleigh

r

DOUBLE PRECISION,INTENT(OUT) forvdrngrayleigh

2351


DescriptionTypeName

C: float* forvsRngRayleigh

double* for vdRngRayleigh

Return Values








LognormalGenerates lognormally distributed randomnumbers.

Syntax

Fortran:

status = vsrnglognormal( method, stream, n, r, a, sigma, b, beta )

status = vdrnglognormal( method, stream, n, r, a, sigma, b, beta )

C:

status = vsRngLognormal( method, stream, n, r, a, sigma, b, beta );

status = vdRngLognormal( method, stream, n, r, a, sigma, b, beta );

Description

This function generates lognormally distributed random numbers with average of distribution

a and standard deviation σ of subject normal distribution, displacement b, and scalefactor β,

where a, σ, b, β∈R ; σ > 0 , β > 0.

2352




Input Parameters

DescriptionTypeName


VSL_METHOD_SLOGNORMAL_ICDF

VSL_METHOD_DLOGNORMAL_ICDF

VSL_METHOD_SLOGNORMAL_ICDF_ACCURATE

VSL_METHOD_DLOGNORMAL_ICDF_ACCURATE


C: int

method



stream




n

2353


DescriptionTypeName

C: int

Average a of the subject normal distributionFORTRAN: REAL,INTENT(IN) forvsrnglognormal

a

DOUBLE PRECISION,INTENT(IN) forvdrnglognormal

C: float forvsRngLognormal

double for vdRngLognormal

Standard deviation σ of the subject normal distributionFORTRAN: REAL,INTENT(IN) forvsrnglognormal

sigma




Displacement bFORTRAN: REAL,INTENT(IN) forvsrnglognormal

b




2354


DescriptionTypeName

Scalefactor βFORTRAN: REAL,INTENT(IN) forvsrnglognormal

beta




Output Parameters

DescriptionTypeName

Vector of n lognormally distributed random numbersFORTRAN: REAL,INTENT(OUT) forvsrnglognormal

r

DOUBLE PRECISION,INTENT(OUT) forvdrnglognormal

C: float* forvsRngLognormal

double* for vdRngLognormal

Return Values






2355




GumbelGenerates Gumbel distributed random values.

Syntax

Fortran:

status = vsrnggumbel( method, stream, n, r, a, beta )

status = vdrnggumbel( method, stream, n, r, a, beta )

C:

status = vsRngGumbel( method, stream, n, r, a, beta );

status = vdRngGumbel( method, stream, n, r, a, beta );

Description

This function generates Gumbel distributed random numbers with displacement a and scalefactor




2356


Input Parameters

DescriptionTypeName


VSL_METHOD_SGUMBEL_ICDF

VSL_METHOD_DGUMBEL_ICDF


C: int

method


FORTRAN: Descriptor of the stream state structureFORTRAN: TYPE(VSL_STREAM_STATE),INTENT(IN)

stream




n

C: int

Displacement aFORTRAN: REAL,INTENT(IN) for vsrnggumbel

a

DOUBLE PRECISION,INTENT(IN) for vdrnggumbel

C: float for vsRngGumbel

double for vdRngGumbel

Scalefactor βFORTRAN: REAL,INTENT(IN) for vsrnggumbel

beta

DOUBLE PRECISION,INTENT(IN) for vdrnggumbel

C: float for vsRngGumbel

double for vdRngGumbel

2357


Output Parameters

DescriptionTypeName

Vector of n random numbers with Gumbel distributionFORTRAN: REAL,INTENT(OUT) forvsrnggumbel

r

DOUBLE PRECISION,INTENT(OUT) forvdrnggumbel

C: float* for vsRngGumbel

double* for vdRngGumbel

Return Values








GammaGenerates gamma distributed random values.

Syntax

Fortran:

status = vsrnggamma( method, stream, n, r, alpha, a, beta )

status = vdrnggamma( method, stream, n, r, alpha, a, beta )

2358


C:

status = vsRngGamma( method, stream, n, r, alpha, a, beta );

status = vdRngGamma( method, stream, n, r, alpha, a, beta );

Description

This function generates random numbers with gamma distribution that has shape parameter

α, displacement a, and scale parameter β, where α, β, and a∈R ; α > 0, β > 0.


where Γ(α) is the complete gamma function.


2359


Input Parameters

DescriptionTypeName


VSL_METHOD_SGAMMA_GNORM

VSL_METHOD_DGAMMA_GNORM

VSL_METHOD_SGAMMA_GNORM_ACCURATE

VSL_METHOD_DGAMMA_GNORM_ACCURATE


C: int

method

Acceptance/rejection method using random numberswith Gaussian distribution. See brief description ofthe method GNORM in Table 10-1


stream




n

C: int

Shape αFORTRAN: REAL,INTENT(IN) for vsrnggamma

alpha

DOUBLE PRECISION,INTENT(IN) for vdrnggamma

C: float for vsRngGamma

double for vdRngGamma

Displacement aFORTRAN: REAL,INTENT(IN) for vsrnggamma

a



2360


DescriptionTypeName


Scalefactor βFORTRAN: REAL,INTENT(IN) for vsrnggamma

beta




Output Parameters

DescriptionTypeName

Vector of n random numbers with gamma distributionFORTRAN: REAL,INTENT(OUT) for vsrnggamma

r

DOUBLE PRECISION,INTENT(OUT) for vdrnggamma

C: float* for vsRngGamma

double* for vdRngGamma

Return Values








2361


BetaGenerates beta distributed random values.

Syntax

Fortran:

status = vsrngbeta( method, stream, n, r, p, q, a, beta )

status = vdrngbeta( method, stream, n, r, p, q, a, beta )

C:

status = vsRngBeta( method, stream, n, r, p, q, a, beta );

status = vdRngBeta( method, stream, n, r, p, q, a, beta );

Description

This function generates random numbers with beta distribution that has shape parameters p

and q, displacement a, and scale parameter β, where p, q, a, and β∈R ; p > 0, q > 0, β> 0.


where B(p, q) is the complete beta function.


2362


Input Parameters

DescriptionTypeName


VSL_METHOD_SBETA_CJA

VSL_METHOD_DBETA_CJA

VSL_METHOD_SBETA_CJA_ACCURATE

VSL_METHOD_DBETA_CJA_ACCURATE


C: int

method

See brief description of the method CJA in Table 10-1


stream




n

C: int

Shape pFORTRAN: REAL,INTENT(IN) for vsrngbeta

p

DOUBLE PRECISION,INTENT(IN) for vdrngbeta

C: float for vsRngBeta

double for vdRngBeta

2363


DescriptionTypeName

Shape qFORTRAN: REAL,INTENT(IN) for vsrngbeta

q




Displacement aFORTRAN: REAL,INTENT(IN) for vsrngbeta

a




Scalefactor βFORTRAN: REAL,INTENT(IN) for vsrngbeta

beta




Output Parameters

DescriptionTypeName

Vector of n random numbers with beta distributionFORTRAN: REAL,INTENT(OUT) for vsrngbeta

r

DOUBLE PRECISION,INTENT(OUT) for vdrngbeta

C: float* for vsRngBeta

double* for vdRngBeta

2364


Return Values








Discrete Distributions

This section describes routines for generating random numbers with discrete distribution.

UniformGenerates random numbers uniformly distributedover the interval [a, b).

Syntax

Fortran:

status = virnguniform( method, stream, n, r, a, b )

C:

status = viRngUniform( method, stream, n, r, a, b );

Description

This function generates random numbers uniformly distributed over the interval [a, b), where

a, b are the left and right bounds of the interval respectively, and a, b∈Z; a < b.

The probability distribution is given by:

2365


Input Parameters

DescriptionTypeName

Generation method; dummy and set to 0 in case ofuniform distribution. The specific value is as follows:

VSL_METHOD_IUNIFORM_STD


C: int

method

Standard method. Currently there is only one methodfor this distribution generator.


stream




n

C: int

Left interval bound aFORTRAN: INTEGER,INTENT(IN)

a

C: int

Right interval bound bFORTRAN: INTEGER,INTENT(IN)

b

2366


DescriptionTypeName

C: int

Output Parameters

DescriptionTypeName

Vector of n random numbers uniformly distributedover the interval [a,b)

FORTRAN: INTEGER,INTENT(OUT)

C: int*

r

Return Values








UniformBitsGenerates integer random values with uniform bitdistribution.

Syntax

Fortran:

status = virnguniformbits( method, stream, n, r )

C:

status = viRngUniformBits( method, stream, n, r );

2367


Description

This function generates integer random values with uniform bit distribution.The generators ofuniformly distributed numbers can be represented as recurrence relations over integer valuesin modular arithmetic. Apparently, each integer can be treated as a vector of several bits. Ina truly random generator, these bits are random, while in pseudorandom generators thisrandomness can be violated. For example, a well known drawback of linear congruentialgenerators is that lower bits are less random than higher bits (for example, see [Knuth81]).For this reason, care should be taken when using this function. Typically, in a 32-bit LCG only24 higher bits of an integer value can be considered random. See VSL Notes for details.

Input Parameters

DescriptionTypeName

Generation method; dummy and set to 0. The specificvalue is as follows:

VSL_METHOD_IUNIFORMBITS_STD


C: int

method

Standard method. Currently there is only one methodfor this distribution generator.


stream




n

C: int

Output Parameters

DescriptionTypeName

FORTRAN: Vector of n random integer numbers. Ifthe stream was generated by a 64 or a 128-bitgenerator, each integer value is represented by two


C: unsigned int*

r

or four elements of r respectively. The number ofbytes occupied by each integer is contained in the

2368


DescriptionTypeName

field wordsize of the structureVSL_BRNG_PROPERTIES. The total number of bits thatare actually used to store the value are contained inthe field nbits of the same structure. See AdvancedService Routines for a more detailed discussion ofVSLBrngProperties.

C: Vector of n random integer numbers. If the streamwas generated by a 64 or a 128-bit generator, eachinteger value is represented by two or four elementsof r respectively. The number of bytes occupied byeach integer is contained in the field WordSize of thestructure VSLBrngProperties. The total number ofbits that are actually used to store the value arecontained in the field NBits of the same structure.See Advanced Service Routines for a more detaileddiscussion of VSLBrngProperties.

Return Values








BernoulliGenerates Bernoulli distributed random values.

Syntax

Fortran:

status = virngbernoulli( method, stream, n, r, p )

2369


C:

status = viRngBernoulli( method, stream, n, r, p );

Description

This function generates Bernoulli distributed random numbers with probability p of a single trialsuccess, where

p∈R; 0 ≤ p ≤ 1.

A variate is called Bernoulli distributed, if after a trial it is equal to 1 with probability of successp, and to 0 with probability 1 — p.


P(X = 1) = p

P(X = 0) = 1 - p


Input Parameters

DescriptionTypeName

Generation method. The specific value is as follows:

VSL_METHOD_IBERNOULLI_ICDF


method

C: int Inverse cumulative distribution function method.


stream



2370


DescriptionTypeName


n

C: int

Success probability p of a trialFORTRAN: DOUBLEPRECISION, INTENT(IN)

p

C: double

Output Parameters

DescriptionTypeName

Vector of n Bernoulli distributed random valuesFORTRAN: INTEGER,INTENT(OUT)

r

C: int*

Return Values








GeometricGenerates geometrically distributed random values.

Syntax

Fortran:

status = virnggeometric( method, stream, n, r, p )

2371


C:

status = viRngGeometric( method, stream, n, r, p );

Description

This function generates geometrically distributed random numbers with probability p of a single

trial success, where p∈R; 0 < p < 1.

A geometrically distributed variate represents the number of independent Bernoulli trialspreceding the first success. The probability of a single Bernoulli trial success is p.


P(X = k) = p·(1 - p)k, k∈ {0,1,2, ... }.


Input Parameters

DescriptionTypeName


VSL_METHOD_IGEOMETRIC_ICDF


method

C: int Inverse cumulative distribution function method.


stream




n

C: int

2372

DescriptionTypeName

Success probability p of a trialFORTRAN: DOUBLEPRECISION, INTENT(IN)

p

C: double

Output Parameters

DescriptionTypeName

Vector of n geometrically distributed random valuesFORTRAN: INTEGER,INTENT(OUT)

r

C: int*

Return Values








BinomialGenerates binomially distributed random numbers.

Syntax

Fortran:

status = virngbinomial( method, stream, n, r, ntrial, p )

C:

status = viRngBinomial( method, stream, n, r, ntrial, p );

2373


Description

This function generates binomially distributed random numbers with number of independent

Bernoulli trials m, and with probability p of a single trial success, where p∈R; 0 ≤ p ≤ 1,

m∈N.

A binomially distributed variate represents the number of successes in m independent Bernoullitrials with probability of a single trial success p.



Input Parameters

DescriptionTypeName


VSL_METHOD_IBINOMIAL_BTPE


method

C: int See brief description of the BTPE method in Table10-1 .

2374


DescriptionTypeName


stream




n

C: int

Number of independent trials mFORTRAN: INTEGER,INTENT(IN)

ntrials

C: int

Success probability p of a single trialFORTRAN: DOUBLEPRECISION, INTENT(IN)

p

C: double

Output Parameters

DescriptionTypeName

Vector of n binomially distributed random valuesFORTRAN: INTEGER,INTENT(OUT)

r

C: int*

Return Values






2375




HypergeometricGenerates hypergeometrically distributed randomvalues.

Syntax

Fortran:

status = virnghypergeometric( method, stream, n, r, l, s, m )

C:

status = viRngHypergeometric( method, stream, n, r, l, s, m );

Description

This function generates hypergeometrically distributed random values with lot size l, size of

sampling s, and number of marked elements in the lot m, where l, m, s∈N∪{0}; l ≥ max(s,m).

Consider a lot of l elements comprising m “marked” and l-m “unmarked“ elements. A trialsampling without replacement of exactly s elements from this lot helps to define thehypergeometric distribution, which is the probability that the group of s elements containsexactly k marked elements.

The probability distribution is given by:)

, k∈ {max(0, s + m - l), ..., min(s, m)}


2376


Input Parameters

DescriptionTypeName


VSL_METHOD_IHYPERGEOMETRIC_H2PE


method

C: int See brief description of the H2PE method in Table10-1


stream




n

C: int

Lot size lFORTRAN: INTEGER,INTENT(IN)

l

C: int

Size of sampling without replacement sFORTRAN: INTEGER,INTENT(IN)

s

C: int

Number of marked elements mFORTRAN: INTEGER,INTENT(IN)

m

2377


DescriptionTypeName

C: int

Output Parameters

DescriptionTypeName

Vector of n hypergeometrically distributed randomvalues


C: int*

r

Return Values








PoissonGenerates Poisson distributed random values.

Syntax

Fortran:

status = virngpoisson( method, stream, n, r, lambda )

C:

status = viRngPoisson( method, stream, n, r, lambda );

2378


Description

This function generates Poisson distributed random numbers with distribution parameter λ,

where λ∈R; λ > 0.


k∈ {0, 1, 2, ...}.


Input Parameters

DescriptionTypeName


VSL_METHOD_IPOISSON_PTPE

VSL_METHOD_IPOISSON_POISNORM


C: int

method

See brief description of the PTPE and POISNORMmethods in Table 10-1 .


stream



2379


DescriptionTypeName


n

C: int

Distribution parameter λFORTRAN: DOUBLEPRECISION, INTENT(IN)

lambda

C: double

Output Parameters

DescriptionTypeName

Vector of n Poisson distributed random valuesFORTRAN: INTEGER,INTENT(OUT)

r

C: int*

Return Values








2380


PoissonVGenerates Poisson distributed random values withvarying mean.

Syntax

Fortran:

status = virngpoissonv( method, stream, n, r, lambda )

C:

status = viRngPoissonV( method, stream, n, r, lambda );

Description

This function generates n Poisson distributed random numbers xi(i = 1, ..., n) with distribution

parameter λi, where λi∈R; λi > 0.



2381


Input Parameters

DescriptionTypeName


VSL_METHOD_IPOISSONV_POISNORM


method

C: int See brief description of the POISNORM method in Table10-1


stream




n

C: int

Array of n distribution parameters λiFORTRAN: DOUBLEPRECISION, INTENT(IN)

lambda

C: double*

Output Parameters

DescriptionTypeName

Vector of n Poisson distributed random valuesFORTRAN: INTEGER,INTENT(OUT)

r

C: int*

Return Values




2382


NegBinomialGenerates random numbers with negative binomialdistribution.

Syntax

Fortran:

status = virngnegbinomial( method, stream, n, r, a, p )

C:

status = viRngNegbinomial( method, stream, n, r, a, p );

Description

This function generates random numbers with negative binomial distribution and distribution

parameters a and p, where p, a∈R; 0 0.

If the first distribution parameter a∈N, this distribution is the same as Pascal distribution. If

a∈N, the distribution can be interpreted as the expected time of a-th success in a sequence ofBernoulli trials, when the probability of success is p.



2383

Input Parameters

DescriptionTypeName


VSL_METHOD_INEGBINOMIAL_NBAR


method

C: int See brief description of the NBAR method in Table10-1

FORTRAN: descriptor of the stream state structure.FORTRAN: TYPE(VSL_STREAM_STATE),INTENT(IN)

stream

C: pointer to the stream state structure



n

C: int

The first distribution parameter aFORTRAN: DOUBLEPRECISION, INTENT(IN)

a

C: double

The second distribution parameter pFORTRAN: DOUBLEPRECISION, INTENT(IN)

p

C: double

2384


Output Parameters

DescriptionTypeName

Vector of n random values with negative binomialdistribution.


C: int*

r

Return Values








Advanced Service Routines

This section describes service routines for registering a user-designed basic generator(RegisterBrng) and for obtaining properties of the previously registered basic generators(GetBrngProperties). See VSL Notes (“Basic Generators” section of VSL Structure chapter)for substantiation of the need for several basic generators including user-defined BRNGs.

2385


Data types

The Avdanced Service routines refer to a structure defining the properties of the basic generator.This structure is described in Fortran as follows:TYPE VSL_BRNG_PROPERTIES

INTEGER streamstatesize

INTEGER nseeds

INTEGER includeszero

INTEGER wordsize

INTEGER nbits

INTEGER nitstream

INTEGER sbrng

INTEGER dbrng

INTEGER ibrng

END TYPE VSL_BRNG_PROPERTIES

The C version is as follows:typedef struct _VSLBRngProperties {

int StreamStateSize;

int NSeeds;

int IncludesZero;

int WordSize;

int NBits;

InitStreamPtr InitStream;

sBRngPtr sBRng;

dBRngPtr dBRng;

iBRngPtr iBRng;

} VSLBRngProperties;

The following table provides brief descriptions of the fields engaged in the above structure:

2386


Table 10-12 Field Descriptions

Short DescriptionField

The size, in bytes, of the stream state structure for a givenbasic generator.

FORTRAN: streamstatesize

C: StreamStateSize

The number of 32-bit initial conditions (seeds) necessary toinitialize the stream state structure for a given basicgenerator.

FORTRAN: nseeds

C: NSeeds

Flag value indicating whether the generator can produce arandom 0.

FORTRAN: includeszero

C: IncludesZero

Machine word size, in bytes, used in integer-valuecomputations. Possible values: 4, 8, and 16 for 32, 64, and128-bit generators, respectively.

FORTRAN: wordsize

C: WordSize

The number of bits required to represent a random value ininteger arithmetic. Note that, for instance, 48-bit randomvalues are stored to 64-bit (8 byte) memory locations. In

FORTRAN: nbits

C: NBits

this case, wordsize/WordSize is equal to 8 (number ofbytes used to store the random value), while nbits/NBitscontains the actual number of bits occupied by the value (inthis example, 48).

Contains the pointer to the initialization routine of a givenbasic generator.

FORTRAN: initstream

C: InitStream

Contains the pointer to the basic generator of single precisionreal numbers uniformly distributed over the interval (a,b)(real in FORTRAN and float in C).

FORTRAN: sbrng

C: sBRng

Contains the pointer to the basic generator of doubleprecision real numbers uniformly distributed over the interval(a,b) (double PRECISION in FORTRAN and double in C).

FORTRAN: dbrng

C: dBRng

Contains the pointer to the basic generator of integernumbers with uniform bit distribution1 (INTEGER in FORTRANand unsigned int in C).

FORTRAN: ibrng

C: iBRng

1A specific generator that permits operations over single bits and bit groups of random numbers.

2387


RegisterBrngRegisters user-defined basic generator.

Syntax

Fortran:

brng = vslregisterbrng( properties )

C:

brng = vslRegisterBrng( &properties );

Description

An example of a registration procedure can be found in the respective directory of VSL examples.

Input Parameters

DescriptionTypeName

Pointer to the structure containing properties of thebasic generator to be registered

FORTRAN:TYPE(VSL_BRNG_PROPERTIES),INTENT(IN)

properties

C: VSLBrngProperties*

Output Parameters

DescriptionTypeName

Number (index) of the registered basic generator;used for identification. Negative values indicate theregistration error.


C: int

brng

Return Values


Registration cannot be completed due to lackof free entries in the table of registered BRNGs.

VSL_ERROR_BRNG_TABLE_FULL

Bad value in StreamStateSize field.VSL_ERROR_BAD_STREAM_STATE_SIZE

2388


Bad value in WordSize field.VSL_ERROR_BAD_WORD_SIZE

Bad value in NSeeds field.VSL_ERROR_BAD_NSEEDS

Bad value in NBits field.VSL_ERROR_BAD_NBITS

At least one of the fields iBrng, dBrng, sBrng orInitStream is a NULL pointer.

VSL_ERROR_NULL_PTR

GetBrngPropertiesReturns structure with properties of a given basicgenerator.

Syntax

Fortran:

status = vslgetbrngproperties( brng, properties )

C:

status = vslGetBrngProperties( brng, &properties );

Input Parameters

DescriptionTypeName

Number (index) of the registered basic generator;used for identification. See specific values in Table10-2 . Negative values indicate the registration error.


C: int

brng

Output Parameters

DescriptionTypeName

Pointer to the structure containing properties of thegenerator with number brng

FORTRAN:TYPE(VSL_BRNG_PROPERTIES),INTENT(OUT)

properties

C: VSLBrngProperties*

2389


Return Values



Formats for User-Designed Generators

To register a user-designed basic generator using RegisterBrng function, you need to passthe pointer iBrng to the integer-value implementation of the generator; the pointers sBrngand dBrng to the generator implementations for single and double precision values, respectively;and pass the pointer InitStream to the stream initialization routine. See belowrecommendations on defining such functions with input and output arguments. An example ofthe registration procedure for a user-designed generator can be found in the respective directoryof VSL examples.

The respective pointers are defined as follows:typedef int (*InitStreamPtr)( int method, void * stream, int n, constunsigned int params[] );

typedef int (*sBRngPtr)( void * stream, int n, float r[], float a, floatb );

typedef int (*dBRngPtr)( void * stream, int n, double r[], double a, doubleb );

typedef int (*iBRngPtr)( void * stream, int n, unsigned int r[] );

InitStream

C:

int MyBrngInitStream( int method, VSLStreamStatePtr stream,

int n, const unsigned int params[] );

{

/* Initialize the stream */

...

} /* MyBrngInitStream */

2390


Description

The initialization routine of a user-designed generator must initialize stream according to thespecified initialization method, initial conditions params and the argument n. The value ofmethod determines the initialization method to be used.

• If method is equal to 1, the initialization is by the standard generation method, which mustbe supported by all basic generators. In this case the function assumes that the streamstructure was not previously initialized. The value of n is used as the actual number of 32-bitvalues passed as initial conditions through params. Note, that the situation when the actualnumber of initial conditions passed to the function is not sufficient to initialize the generatoris not an error. Whenever it occurs, the basic generator must initialize the missing conditionsusing default settings.

• If method is equal to 2, the generation is by the leapfrog method, where n specifies thenumber of computational nodes (independent streams). Here the function assumes that thestream was previously initialized by the standard generation method. In this case paramscontains only one element, which identifies the computational node. If the generator doesnot support the leapfrog method, the function must return the error codeVSL_ERROR_LEAPFROG_UNSUPPORTED.

• If method is equal to 3, the generation is by the block-splitting method. Same as above,the stream is assumed to be previously initialized by the standard generation method;params is not used, n identifies the number of skipped elements. If the generator does notsupport the block-splitting method, the function must return the error codeVSL_ERROR_SKIPAHEAD_UNSUPPORTED.

For a more detailed description of the leapfrog and the block-splitting methods, refer to thedescription of LeapfrogStream and SkipAheadStream, respectively.

Stream state structure is individual for every generator. However, each structure has a numberof fields that are the same for all the generators:

C:

typedef struct

{

unsigned int Reserved1[2];

unsigned int Reserved2[2];

[fields specific for the given generator]

} MyStreamState;

2391


The fields Reserved1 and Reserved2 are reserved for private needs only, and must not bemodified by the user. When including specific fields into the structure, follow the rules below:

• The fields must fully describe the current state of the generator. For example, the state ofa linear congruential generator can be identified by only one initial condition;

• If the generator can use both the leapfrog and the block-splitting methods, additional fieldsshould be introduced to identify the independent streams. For example, in LCG(a, c, m),apart from the initial conditions, two more fields should be specified: the value of themultiplier ak and the value of the increment (ak-1)c/(a-1).

For a more detailed discussion, refer to [Knuth81], and [Gentle98]. An example of theregistration procedure can be found in the respective directory of VSL examples.

iBRng

C:

void iMyBrng( VSLStreamStatePtr stream, int n,

unsigned int r[] )

{

int i; /* Loop variable */

/* Generating integer random numbers */

/* Pay attention to word size needed to

store only random number */

for( i = 0; i < n; i++)

{

r[i] = ...;

}

/* Update stream state */

...

return errcode;

} /* iMyBrng */

2392

NOTE. When using 64 and 128-bit generators, consider digit capacity to store thenumbers to the random vector r correctly. For example, storing one 64-bit value requirestwo elements of r, the first to store the lower 32 bits and the second to store the higher32 bits. Similarly, use 4 elements of r to store a 128-bit value.

sBRng

C:

void sMyBrng( VSLStreamStatePtr stream, int n, float r[],

float a, float b )

{


/* Generating float (a,b) random numbers */

for ( i = 0; i < n; i++ )

{

r[i] = ...;

}


...

return errcode;

} /* sMyBrng */

2393

dBRng

C:

void dMyBrng( VSLStreamStatePtr stream, int n, double r[],

double a, double b )

{


/* Generating double (a,b) random numbers */

for ( i = 0; i < n; i++ )

{

r[i] = ...;

}


...

return errcode;

} /* dMyBrng */

Convolution and Correlation

Overview

VSL provides a set of routines intended to perform linear convolution and correlationtransformations for single and double precision data.

For correct definition of implemented operations, see Mathematical Notation and Definitionssection.

The current implementation provides:

• Fourier algorithms for one-dimensional single and double precision data

• Fourier algorithms for multi-dimensional single and double precision data

• Direct algorithms for one-dimensional single and double precision data.

• Direct algorithms for multi-dimensional single and double precision data.

2394

One-dimensional algorithms cover the following functions from the IBM* ESSL library:SCONF, SCORF

SCOND, SCORD

SDCON, SDCOR

DDCON, DDCOR

SDDCON, SDDCOR.

Special wrappers are designed to simulate these ESSL functions. The wrappers are providedas sample sources for FORTRAN and C. To reuse them, use the following directories:${MKL}/examples/vslc/essl/vsl_wrappers

${MKL}/examples/vslf/essl/vsl_wrappers

Additionally, you can browse the examples demonstrating the calculation of the ESSL functionsthrough the wrappers. You can find the examples in the following directories:${MKL}/examples/vslc/essl

${MKL}/examples/vslf/essl

Convolution and correlation API provides interfaces for FORTRAN-90 and C/89 languages. Youmay use the C/89 interface also with later versions of C or C++, or FORTRAN-90 interface withprograms written in FORTRAN-95. Note that there is no FORTRAN-77 support.

For users of the C/C++ and FORTRAN languages, the mkl_vsl.h and mkl_vsl.fi headers areprovided. Both header files are found under the directory:

${MKL}/include

See more details about the FORTRAN header in Random Number Generators section of thischapter.

Convolution and correlation API is implemented through task objects, or tasks. Task object isa data structure, or descriptor, which holds parameters that determine the specific convolutionor correlation operation. Such parameters may be precision, type, and number of dimensionsof user data, an identifier of the computation algorithm to be used, shapes of data arrays, andso on.

All Intel MKL convolution and correlation routines process task objects in one way or another:either create a new task descriptor, change the parameter settings, compute mathematicalresults of the convolution or correlation using the stored parameters, or perform other operations.Accordingly, all routines are split into the following groups:

Task Constructors - routines that create a new task object descriptor and set up most commonparameters.

Task Editors - routines that can set or modify some parameter settings in the existing taskdescriptor.

2395


Task Execution Routines - compute results of the convolution or correlation operation over theactual input data, using the operation parameters held in the task descriptor.

Task Copy - routines used to make several copies of the task descriptor.

Task Destructors - routines that delete task objects and free the memory.

When the task is executed or copied for the first time, a special process runs which is calledtask commitment. During this process, consistency of task parameters is checked and therequired work data are prepared. If the parameters are consistent, the task is tagged ascommitted successfully. The task remains committed until you edit its parameters. Hence, thetask can be executed multiple times after a single commitment process. Since the taskcommitment process may include costly intermediate calculations such as preparation of Fouriertransform of input data, launching the process only once can help speed up overall performance.

Naming Conventions

The names of FORTRAN routines in the convolution and correlation API are written in lowercase,while the names of FORTRAN types and constants are written in uppercase. The names are notcase-sensitive.

In C, the names of routines, types, and constants are case-sensitive and can be lowercase anduppercase.

The names of routines have the following structure:

vsl[datatype]{Conv|Corr}<base name> for C-interface

vsl[datatype]{conv|corr}<base name> for FORTRAN-interface

where vsl is a prefix indicating that the routine belongs to Vector Statistical Library of Intel®

MKL.

The field [datatype] is optional. If present, the symbol specifies the type of the input andoutput data and can be either s (for single precision real type) or d (for double precision realtype).

The prefix Conv or Corr specifies whether the routine refers to convolution or correlation task,respectively.

The <base name> field specifies a particular functionality that the routine is designed for, forexample, NewTask, DeleteTask.

NOTE. In this document, routines are often referred to by their base name when thisdoes not lead to ambiguity. In the routine reference, the full name is always used inprototypes and code examples.

2396

Data Types

All convolution or correlation routines use the following types for specifying data objects:

Data ObjectType

Pointer to a task descriptor for convolutionFORTRAN:TYPE(VSL_CONV_TASK)

C: VSLConvTaskPtr

Pointer to a task descriptor for correlationFORTRAN:TYPE(VSL_CORR_TASK)

C: VSLCorrTaskPtr

Input/output user data in single precisionFORTRAN: REAL(KIND=4)

C: float

Input/output user data in double precisionFORTRAN: REAL(KIND=8)

C: double

All other dataFORTRAN: INTEGER

C: int

Generic integer type (without specifying the byte size) is used for all integer data.

NOTE. The actual size of the generic integer type is platform-dependent. The appropriatebyte size for integers must be chosen at the stage of compiling your software.

Parameters

Basic parameters held by the task descriptor are assigned values when the task object iscreated, copied, or modified by task editors. Parameters of the correlation or convolution taskare initially set up by task constructors when the task object is created. Parameter changes oradditional settings are made by task editors. More parameters which define location of the databeing convolved need to be specified when the task execution routine is invoked.

2397


According to how the parameters are passed or assigned values, all of them can be categorizedas either explicit (directly passed as routine parameters when a task object is created orexecuted) or optional (assigned some default or implicit values during task construction).

The following table lists all applicable parameters used in the Intel MKL convolution andcorrelation API.

Table 10-13 Convolution and Correlation Task Parameters

DescriptionDefault ValueLabel

TypeCategoryName

Specifies whether the task relates toconvolution or correlation

Implied by theconstructorname

integerexplicitjob

Specifies the type (real or complex) ofthe input/output data. Set to real in thecurrent version.


integerexplicittype

Specifies precision (single or double) ofthe input/output data to be provided inarrays x,y,z.


integerexplicitprecision

Specifies whether the task relates tocomputing linear or circularconvolution/correlation

“linear”integeroptionalkind

Specifies whether theconvolution/correlation computationshould be done via Fourier transforms,

Noneintegerexplicitmode

or by a direct method, or byautomatically choosing between thetwo. See SetMode for the list of namedconstants for this parameter.

Hints at a particular computationmethod if several methods are availablefor the given mode. Setting this

“auto“integeroptionalmethod

parameter to “auto” means thatsoftware will choose the best availablemethod.

2398


DescriptionDefault ValueLabel

TypeCategoryName

Specifies precision of internalcalculations. Can enforce doubleprecision calculations even when

Set equal to thevalue ofprecision

integeroptionalinternal_precision

input/output data are single precision.See SetInternalPrecision for thelist of named constants for thisparameter.

Specifies the rank (number ofdimensions) of the user data providedin arrays x,y,z. Can be in the rangefrom 1 to 7.

Noneintegerexplicitdims

Specify input data arrays. See DataAllocation for more information.

Nonerealarrays

explicitx,y

Specifies output data array. See DataAllocation for more information.

Nonerealarray

explicitz

Define shapes of the arrays x, y, z. SeeData Allocation for more information.

Noneintegerarrays

explicitxshape,yshape,zshape

Define strides within arrays x, y, z, thatis specify the physical location of theinput and output data in these arrays.See Data Allocation for moreinformation.

Noneintegerarrays

explicitxstride,ystride,zstride

Defines the first element of themathematical result that will be storedto output array z. See SetStart andData Allocation for more information.

Undefinedintegerarray

optionalstart

Defines how to thin out themathematical result that will be storedto output array z. See SetDecimationand Data Allocation for moreinformation.

Undefinedintegerarray

optionaldecimation

2399


Users of the C or C++ language may pass the NULL pointer instead of either or all of theparameters xstride, ystride, or zstride for multi-dimensional calculations. In this case,the software assumes the dense data allocation for the arrays x, y, or z due to theFORTRAN-style “by columns” representation of multi-dimensional arrays.

Task Status and Error Reporting

Task status is an integer value which is zero if no error has been detected while processing thetask, or a specific non-zero error code otherwise. Negative status values are used for errors,and positive values are reserved for warnings.

An error can be caused by bad parameter values, system fault like memory allocation failure,or can be an internal error self-detected by the software.

Each task descriptor contains the current status of the task. When creating a task object,constructor assigns the VSL_STATUS_OK status to the task. When processing the task afterwards,other routines such as editors of executors can change the task status if an error occurs andwrite a corresponding error code into the task status field.

Note that at the stage of creating a task or editing its parameters, the set of parameters maybe inconsistent. The parameter consistency check is only performed during the task commitmentoperation which is implicitly invoked before task execution or task copying. If an error is detectedat this stage, task execution or task copying is terminated and the task descriptor saves thecorresponding error code. Once an error occurs, any further attempts to process that taskdescriptor will be terminated and the task will keep the same error code.

Normally, every convolution or correlation function (except DeleteTask) returns the statusassigned to the task while performing the function operation.

The status codes are given symbolic names defined in the respective header files. For C/C++,these names are defined as macros via #define statements, and for FORTRAN as integerconstants via PARAMETER operators.

If there is no error, the VSL_STATUS_OK is returned, which is defined as zero:

#define, VSL_STATUS_OK 0C/C++:

INTEGER,PARAMETER:: VSL_STATUS_OK = 0F90/F95:

In case of an error, a non-zero error code is returned, which signals about the origin of thefailure. The following status codes for the convolution/correlation error codes are pre-definedin the header files for both C/C++ and FORTRAN languages.

Convolution/Correlation Status Codes and Messages

MessageStatus Code

Requested functionality is not implemented.VSL_CC_ERROR_NOT_IMPLEMENTED,

2400


MessageStatus Code

Memory allocation failure.VSL_CC_ERROR_ALLOCATION_FAILURE

Task descriptor is corrupted.VSL_CC_ERROR_BAD_DESCRIPTOR

A service function has failed.VSL_CC_ERROR_SERVICE_FAILURE

Failure while editing the task.VSL_CC_ERROR_EDIT_FAILURE

You cannot edit this parameter.VSL_CC_ERROR_EDIT_PROHIBITED

Task committment has failed.VSL_CC_ERROR_COMMIT_FAILURE

Failure while copying the task.VSL_CC_ERROR_COPY_FAILURE

Failure while deleting the task.VSL_CC_ERROR_DELETE_FAILURE

Bad argument or task parameter.VSL_CC_ERROR_BAD_ARGUMENT

Bad parameter: job.VSL_CC_ERROR_JOB

Bad parameter: kind.SL_CC_ERROR_KIND

Bad parameter: mode.VSL_CC_ERROR_MODE

Bad parameter: method.VSL_CC_ERROR_METHOD

Bad parameter: type.VSL_CC_ERROR_TYPE

Bad parameter: external_precision.VSL_CC_ERROR_EXTERNAL_PRECISION

Bad parameter: internal_precision.VSL_CC_ERROR_INTERNAL_PRECISION

Incompatible external/internal precisions.VSL_CC_ERROR_PRECISION

Bad parameter: dims.VSL_CC_ERROR_DIMS

Bad parameter: xshape.VSL_CC_ERROR_XSHAPE

Bad parameter: yshape.VSL_CC_ERROR_YSHAPE

2401


MessageStatus Code

Callback function for an abstract BRNG returns aninvalid number of updated entries in a buffer, thatis, < 0 or >nmax.

Bad parameter: zshape.VSL_CC_ERROR_ZSHAPE

Bad parameter: xstride.VSL_CC_ERROR_XSTRIDE

Bad parameter: ystride.VSL_CC_ERROR_YSTRIDE

Bad parameter: zstride.VSL_CC_ERROR_ZSTRIDE

Bad parameter: x.VSL_CC_ERROR_X

Bad parameter: y.VSL_CC_ERROR_Y

Bad parameter: z.VSL_CC_ERROR_Z

Bad parameter: start.VSL_CC_ERROR_START

Bad parameter: decimation.VSL_CC_ERROR_DECIMATION

Some other error.VSL_CC_ERROR_OTHER

Task Constructors

Task constructors are routines intended for creating a new task descriptor and setting up basicparameters. This means that no additional parameter adjustment is typically required and otherroutines can use the task object.

Intel® MKL implementation of the convolution and correlation API provides two different formsof constructors: a general form and an X-form. X-form constructors work in the same way asthe general form but also assign particular data to the first operand vector used in convolutionor correlation operation (stored in array x).

Using X-form constructors is recommended when you need to compute multiple convolutionsor correlations with the same data vector held in array x against different vectors held in arrayy. This helps improve performance by eliminating unnecessary overhead in repeated computationof intermediate data required for the operation.

2402

For each constructor routine there is also an associated one-dimensional version which exploitsthe algorithmic and computational benefits provided by the simplicity of the data structures forone-dimensional case.

NOTE. If constructor fails to create a task descriptor, it returns NULL task pointer.

The Table 10-14 lists available task constructors:

Table 10-14 Task Constructors

DescriptionRoutine

Creates a new convolution or correlation task descriptorfor a multidimensional case.

NewTask

Creates a new convolution or correlation task descriptorfor a one-dimensional case.

NewTask1D

Creates a new convolution or correlation task descriptoras an X-form for a multidimensional case.

NewTaskX

Creates a new convolution or correlation task descriptoras an X-form for a one-dimensional case.

NewTaskX1D

NewTaskCreates a new convolution or correlation taskdescriptor for multidimensional case.

Syntax

Fortran:

status = vslsconvnewtask(task, mode, dims, xshape, yshape, zshape)

status = vsldconvnewtask(task, mode, dims, xshape, yshape, zshape)

status = vslscorrnewtask(task, mode, dims, xshape, yshape, zshape)

status = vsldcorrnewtask(task, mode, dims, xshape, yshape, zshape)

2403


C:

status = vslsConvNewTask(task, mode, dims, xshape, yshape, zshape);

status = vsldConvNewTask(task, mode, dims, xshape, yshape, zshape);

status = vslsCorrNewTask(task, mode, dims, xshape, yshape, zshape);

status = vsldCorrNewTask(task, mode, dims, xshape, yshape, zshape);

Description

Each NewTask constructor creates a new convolution or correlation task descriptor with theuser specified values for explicit parameters. The optional parameters are set to their defaultvalues (see Table 10-13 ).

The parameters xshape, yshape, and zshape define the shapes of the input and output dataprovided by the arrays x, y, and z, respectively. Each shape parameter is an array of integerswith its length equal to the value of dims. You explicitly assign the shape parameters whencalling the constructor. If the value of the parameter dims is 1, then xshape, yshape, zshapeare equal to the number of elements read from the arrays x and y or stored to the array z.Note that values of shape parameters may differ from physical shapes of arrays x, y, and z ifnon-trivial strides are assigned.

If constructor fails to create a task descriptor, it returns NULL task pointer.

Input Parameters

DescriptionTypeName

Specifies whether convolution/correlation calculationmust be performed by using a direct algorithm orthrough Fourier transform of the input data. See Table10-16 for a list of possible values.

FORTRAN: INTEGER

C: int

mode

Rank of user data. Specifies number of dimensionsfor the input and output arrays x, y, and z used duringthe execution stage. Must be in the range from 1 to7. The value is explicitly assigned by the constructor.

FORTRAN: INTEGER

C: int

dims

Defines the shape of the input data for the sourcearray x. See Data Allocation for more information.

FORTRAN: INTEGER,DIMENSION(*)

C: int[]

xshape

2404


DescriptionTypeName

Defines the shape of the input data for the sourcearray y. See Data Allocation for more information.


C: int[]

yshape

Defines the shape of the output data to be stored inarray z. See Data Allocation for more information.


C: int[]

zshape

Output Parameters

DescriptionTypeName

Pointer to the task descriptor if created successfullyor NULL pointer otherwise.

FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvnewtask,vsldconvnewtask

task

TYPE(VSL_CORR_TASK) forvslscorrnewtask,vsldcorrnewtask

C: VSLConvTaskPtr* forvslsConvNewTask,vsldConvNewTask

VSLCorrTaskPtr* forvslsCorrNewTask,vsldCorrNewTask

Set to VSL_STATUS_OK if the task is createdsuccessfully or set to non-zero error code otherwise.

FORTRAN: INTEGER

C: int

status

2405


NewTask1DCreates a new convolution or correlation taskdescriptor for one-dimensional case.

Syntax

Fortran:

status = vslsconvnewtask1d(task, mode, xshape, yshape, zshape)

status = vsldconvnewtask1d(task, mode, xshape, yshape, zshape)

status = vslscorrnewtask1d(task, mode, xshape, yshape, zshape)

status = vsldcorrnewtask1d(task, mode, xshape, yshape, zshape)

C:

status = vslsConvNewTask1D(task, mode, xshape, yshape, zshape);

status = vsldConvNewTask1D(task, mode, xshape, yshape, zshape);

status = vslsCorrNewTask1D(task, mode, xshape, yshape, zshape);

status = vsldCorrNewTask1D(task, mode, xshape, yshape, zshape);

Description

Each NewTask1D constructor creates a new convolution or correlation task descriptor with theuser specified values for explicit parameters. The optional parameters are set to their defaultvalues (see Table 10-13 ). Unlike NewTask, these routines represent a special one-dimensionalversion of the constructor which assumes that the value of the parameter dims is 1. Theparameters xshape, yshape, and zshape are equal to the number of elements read from thearrays x and y or stored to the array z. You explicitly assign the shape parameters when callingthe constructor.

Input Parameters

DescriptionTypeName


FORTRAN: INTEGER

C: int

mode

2406


DescriptionTypeName

Defines the length of the input data sequence for thesource array x. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

xshape

Defines the length of the input data sequence for thesource array y. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

yshape

Defines the length of the output data sequence to bestored in array z. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

zshape

Output Parameters

DescriptionTypeName


FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvnewtask1d,vsldconvnewtask1d

task

TYPE(VSL_CORR_TASK) forvslscorrnewtask1d,vsldcorrnewtask1d

C: VSLConvTaskPtr* forvslsConvNewTask1D,vsldConvNewTask1D

VSLCorrTaskPtr* forvslsCorrNewTask1D,vsldCorrNewTask1D


FORTRAN: INTEGER

C: int

status

2407


NewTaskXCreates a new convolution or correlation taskdescriptor for multidimensional case and assignssource data to the first operand vector.

Syntax

Fortran:

status = vslsconvnewtaskx(task, mode, dims, xshape, yshape, zshape, x, xstride)

status = vsldconvnewtaskx(task, mode, dims, xshape, yshape, zshape, x, xstride)

status = vslscorrnewtaskx(task, mode, dims, xshape, yshape, zshape, x, xstride)

status = vsldcorrnewtaskx(task, mode, dims, xshape, yshape, zshape, x, xstride)

C:

status = vslsConvNewTaskX(task, mode, dims, xshape, yshape, zshape, x,xstride);

status = vsldConvNewTaskX(task, mode, dims, xshape, yshape, zshape, x,xstride);

status = vslsCorrNewTaskX(task, mode, dims, xshape, yshape, zshape, x,xstride);

status = vsldCorrNewTaskX(task, mode, dims, xshape, yshape, zshape, x,xstride);

Description

Each NewTaskX constructor creates a new convolution or correlation task descriptor with theuser specified values for explicit parameters. The optional parameters are set to their defaultvalues (see Table 10-13 ).

Unlike NewTask, these routines represent the so called X-form version of the constructor, whichmeans that in addition to creating the task descriptor they assign particular data to the firstoperand vector in array x used in convolution or correlation operation. The task descriptorcreated by the NewTaskX constructor keeps the pointer to the array x all the time, that is, untilthe task object is deleted by one of the destructor routines (see DeleteTask).

2408


Using this form of constructors is recommended when you need to compute multiple convolutionsor correlations with the same data vector in array x against different vectors in array y. Thishelps improve performance by eliminating unnecessary overhead in repeated computation ofintermediate data required for the operation.

The parameters xshape, yshape, and zshape define the shapes of the input and output dataprovided by the arrays x, y, and z, respectively. Each shape parameter is an array of integerswith its length equal to the value of dims. You explicitly assign the shape parameters whencalling the constructor. If the value of the parameter dims is 1, then xshape, yshape, andzshape are equal to the number of elements read from the arrays x and y or stored to thearray z. Note that values of shape parameters may differ from physical shapes of arrays x, y,and z if non-trivial strides are assigned.

The stride parameter xstride specifies the physical location of the input data in the array x.In a one-dimensional case, stride is an interval between locations of consecutive elements ofthe array. For example, if the value of the parameter xstride is s, then only every sth elementof the array x will be used to form the input sequence. The stride value must be positive ornegative but not zero.

Input Parameters

DescriptionTypeName


FORTRAN: INTEGER

C: int

mode

Rank of user data. Specifies number of dimensionsfor the input and output arrays x, y, and z used duringthe execution stage. Must be in the range from 1 to7. The value is explicitly assigned by the constructor.

FORTRAN: INTEGER

C: int

dims

Defines the shape of the input data for the sourcearray x. See Data Allocation for more information.


C: int[]

xshape

Defines the shape of the input data for the sourcearray y. See Data Allocation for more information.


C: int[]

yshape

2409


DescriptionTypeName

Defines the shape of the output data to be stored inarray z. See Data Allocation for more information.


C: int[]

zshape

Pointer to the array containing input data for the firstoperand vector. See Data Allocation for moreinformation.

FORTRAN: REAL(KIND=4),DIMENSION (*) for singleprecision flavors,

REAL(KIND=8), DIMENSION(*) for double precisionflavors

x

C: float[] for singleprecision flavors

double[] for double precisionflavors

Strides for input data in the array x.FORTRAN: INTEGER,DIMENSION (*)

xstride

C: int[]

Output Parameters

DescriptionTypeName


FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvnewtaskx,vsldconvnewtaskx

task

TYPE(VSL_CORR_TASK) forvslscorrnewtaskx,vsldcorrnewtaskx

C: VSLConvTaskPtr* forvslsConvNewTaskX,vsldConvNewTaskX

2410


DescriptionTypeName

VSLCorrTaskPtr* forvslsCorrNewTaskX,vsldCorrNewTaskX


FORTRAN: INTEGER

C: int

status

NewTaskX1DCreates a new convolution or correlation taskdescriptor for one-dimensional case and assignssource data to the first operand vector.

Syntax

Fortran:

status = vslsconvnewtaskx1d(task, mode, xshape, yshape, zshape, x, xstride)

status = vsldconvnewtaskx1d(task, mode, xshape, yshape, zshape, x, xstride)

status = vslscorrnewtaskx1d(task, mode, xshape, yshape, zshape, x, xstride)

status = vsldcorrnewtaskx1d(task, mode, xshape, yshape, zshape, x, xstride)

C:

status = vslsConvNewTaskX1D(task, mode, xshape, yshape, zshape, x, xstride);

status = vsldConvNewTaskX1D(task, mode, xshape, yshape, zshape, x, xstride);

status = vslsCorrNewTaskX1D(task, mode, xshape, yshape, zshape, x, xstride);

status = vsldCorrNewTaskX1D(task, mode, xshape, yshape, zshape, x, xstride);

Description

Each NewTaskX1D constructor creates a new convolution or correlation task descriptor with theuser specified values for explicit parameters. The optional parameters are set to their defaultvalues (see Table 10-13 ).

2411


These routines represent a special one-dimensional version of the so called X-form of theconstructor. This assumes that the value of the parameter dims is 1 and that in addition tocreating the task descriptor, constructor routines assign particular data to the first operandvector in array x used in convolution or correlation operation. The task descriptor created bythe NewTaskX1D constructor keeps the pointer to the array x all the time, that is, until the taskobject is deleted by one of the destructor routines (see DeleteTask).

Using this form of constructors is recommended when you need to compute multiple convolutionsor correlations with the same data vector in array x against different vectors in array y. Thishelps improve performance by eliminating unnecessary overhead in repeated computation ofintermediate data required for the operation.

The parameters xshape, yshape, and zshape are equal to the number of elements read fromthe arrays x and y or stored to the array z. You explicitly assign the shape parameters whencalling the constructor.

The stride parameters xstride specifies the physical location of the input data in the array xand is an interval between locations of consecutive elements of the array. For example, if thevalue of the parameter xstride is s, then only every sth element of the array x will be usedto form the input sequence. The stride value must be positive or negative but not zero.

Input Parameters

DescriptionTypeName


FORTRAN: INTEGER

C: int

mode

Defines the length of the input data sequence for thesource array x. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

xshape

Defines the length of the input data sequence for thesource array y. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

yshape

Defines the length of the output data sequence to bestored in array z. See Data Allocation for moreinformation.

FORTRAN: INTEGER

C: int

zshape

2412


DescriptionTypeName

Pointer to the array containing input data for the firstoperand vector. See Data Allocation for moreinformation.

FORTRAN: REAL(KIND=4),DIMENSION (*) for singleprecision flavors,

REAL(KIND=8),DIMENSION(*) for double precisionflavors

x

C: float[] for singleprecision flavors

double[] for double precisionflavors

Stride for input data sequence in the arrayx.FORTRAN: INTEGER

C: int

xstride

Output Parameters

DescriptionTypeName


FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvnewtaskx1d,vsldconvnewtaskx1d

task

TYPE(VSL_CORR_TASK) forvslscorrnewtaskx1d,vsldcorrnewtaskx1d

C: VSLConvTaskPtr* forvslsConvNewTaskX1D,vsldConvNewTaskX1D

VSLCorrTaskPtr* forvslsCorrNewTaskX1D,vsldCorrNewTaskX1D

2413


DescriptionTypeName


FORTRAN: INTEGER

C: int

status

Task Editors

Task editors in convolution and correlation API of the Intel MKL are routines intended for settingup or changing the following task parameters (see Table 10-13 ):

• mode

• internal_precision

• start

• decimation.

For setting up or changing each of the above parameters, a separate routine exists.

NOTE. Fields of the task descriptor structure are accessible only through the set of taskeditor routines provided with the software.

After applying any of the editor routines to change the task descriptor settings, the task losesits commitment status and will go through the full commitment process again during the nextexecution or copy operation. This is motivated by the fact that the currently stored work datacomputed during the last commitment process may become invalid with respect to newparameter settings. For more information about task commitment, see Overview.

Table 10-15 lists available task editors.

Table 10-15 Task Editors

DescriptionRoutine

Changes the value of the parameter mode for the operationof convolution or correlation.

SetMode

Changes the value of the parameter internal_precisionfor the operation of convolution or correlation.

SetInternalPrecision

Sets the value of the parameter start for the operation ofconvolution or correlation.

SetStart

2414


DescriptionRoutine

Sets the value of the parameter decimation for theoperation of convolution or correlation.

SetDecimation

NOTE. You can use the NULL task pointer in calls to editor routines. In this case, theroutine will be terminated and no system crash will occur.

SetModeChanges the value of the parameter mode in theconvolution or correlation task descriptor.

Syntax

Fortran:

status = vslconvsetmode(task, newmode)

status = vslcorrsetmode(task, newmode)

C:

status = vslConvSetMode(task, newmode);

status = vslCorrSetMode(task, newmode);

Description

The routine changes the value of the parameter mode for the operation of convolution orcorrelation. This parameter defines whether the computation should be done via Fouriertransforms of the input/output data or using a direct algorithm. Initial value for mode is assignedby a task constructor.

Predefined values for the mode parameter are as follows:

Table 10-16 Values of mode parameter

PurposeValue

Compute convolution by using fast Fourier transform.VSL_CONV_MODE_FFT

2415


PurposeValue

Compute correlation by using fast Fourier transform.VSL_CORR_MODE_FFT

Compute convolution directly.VSL_CONV_MODE_DIRECT

Compute correlation directly.VSL_CORR_MODE_DIRECT

Automatically choose direct or Fourier mode for convolution.VSL_CONV_MODE_AUTO

Automatically choose direct or Fourier mode for correlation.VSL_CORR_MODE_AUTO

Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvsetmode

task

TYPE(VSL_CORR_TASK) forvslcorrsetmode

C: VSLConvTaskPtr forvslConvSetMode

VSLCorrTaskPtr forvslCorrSetMode

New value of the parameter mode.FORTRAN: INTEGER

C: int

newmode

Output Parameters

DescriptionTypeName

Current status of the task.FORTRAN: INTEGER

C: int

status

2416


SetInternalPrecisionChanges the value of the parameterinternal_precision in the convolution orcorrelation task descriptor.

Syntax

Fortran:

status = vslconvsetinternalprecision(task, precision)

status = vslcorrsetinternalprecision(task, precision)

C:

status = vslConvSetInternalPrecision(task, precision);

status = vslCorrSetInternalPrecision(task, precision);

Description

The routine changes the value of the parameter internal_precision for the operation ofconvolution or correlation. This parameter defines whether the internal computations of theconvolution or correlation result should be done in single or double precision. Initial value forinternal_precision is assigned by a task constructor and set to either “single” or “double”according to the particular flavor of the constructor used.

Changing the internal_precision can be useful if the default setting of this parameter was“single” but you want to calculate the result with double precision even if input and output dataare represented in single precision.

Predefined values for the internal_precision input parameter are as follows:

Table 10-17 Values of internal_precision Parameter

PurposeValue

Compute convolution with single precision.VSL_CONV_PRECISION_SINGLE

Compute correlation with single precision.VSL_CORR_PRECISION_SINGLE

Compute convolution with double precision.VSL_CONV_PRECISION_DOUBLE

Compute correlation with double precision.VSL_CORR_PRECISION_DOUBLE

2417


Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvsetinternalprecision

task

TYPE(VSL_CORR_TASK) forvslcorrsetinternalprecision

C: VSLConvTaskPtr forvslConvSetInternalPrecision

VSLCorrTaskPtr forvslCorrSetInternalPrecision

New value of the parameter internal_precision.FORTRAN: INTEGER

C: int

precision

Output Parameters

DescriptionTypeName


C: int

status

SetStartChanges the value of the parameter start in theconvolution or correlation task descriptor.

Syntax

Fortran:

status = vslconvsetstart(task, start)

status = vslcorrsetstart(task, start)

2418


C:

status = vslConvSetStart(task, start);

status = vslCorrSetStart(task, start);

Description

The routine sets the value of the parameter start for the operation of convolution or correlation.In a one-dimensional case, this parameter points to the first element in the mathematical resultthat should be stored in the output array. In a multidimensional case, start is an array ofindices and its length is equal to the number of dimensions specified by the parameter dims.For more information about the definition and effect of this parameter, see Data Allocation.

During the initial task descriptor construction, the default value for start is undefined and thisparameter is not used. Hence, the only way to set and use the start parameter is via assigningit some value by one of the SetStart routines.

Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvsetstart

task

TYPE(VSL_CORR_TASK) forvslcorrsetstart

C: VSLConvTaskPtr forvslConvSetStart

VSLCorrTaskPtr forvslCorrSetStart

New value of the parameter start.FORTRAN: INTEGER,DIMENSION (*)

start

C: int[]

2419


Output Parameters

DescriptionTypeName


C: int

status

SetDecimationChanges the value of the parameter decimationin the convolution or correlation task descriptor.

Syntax

Fortran:

status = vslconvsetdecimation(task, decimation)

status = vslcorrsetdecimation(task, decimation)

C:

status = vslConvSetDecimation(task, decimation);

status = vslCorrSetDecimation(task, decimation);

Description

The routine sets the value of the parameter decimation for the operation of convolution orcorrelation.

This parameter determines how to thin out the mathematical result of convolution or correlationbefore writing it into the output data array. For example, in a one-dimensional case, ifdecimation = d >1, only every d-th element of the mathematical result is written to the outputarray z. In a multidimensional case, decimation is an array of indices and its length is equalto the number of dimensions specified by the parameter dims. For more information about thedefinition and effect of this parameter, see Data Allocation.

During the initial task descriptor construction, the default value for decimation is undefinedand this parameter is not used. Hence, the only way to set and use the decimation parameteris via assigning it some value by one of the SetDecimation routines.

2420


Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvsetdecimation

task

TYPE(VSL_CORR_TASK) forvslcorrsetdecimation

C: VSLConvTaskPtr forvslConvSetDecimation

VSLCorrTaskPtr forvslCorrSetDecimation

New value of the parameter decimation.FORTRAN: INTEGER,DIMENSION (*)

decimation

C: int[]

Output Parameters

DescriptionTypeName


C: int

status

Task Execution Routines

Task execution routines compute convolution or correlation results based on parameters heldby the task descriptor and on the supplied user data for input vectors.

Once created and adjusted, the task can be executed multiple times by applying to differentinput/output data of the same type, precision, and shape.

Intel MKL implementation of the convolution and correlation API provides two different formsof execution routines: a general form and an X-form. General form executors use the taskdescriptor created by the general form constructor and expect to get two source data arrays xand y on input. Alternatively, X-form executors use the task descriptor created by the X-formconstructor and expect to get only one source data array y on input because the first array xhas been already specified on the construction stage.

2421


When the task is executed for the first time, the execution routine includes task commitmentoperation, which involves two basic steps: parameters consistency check and preparation ofauxiliary data (for example, this might be the calculation of Fourier transform for input data).

For each execution routine there is also an associated one-dimensional version which exploitsthe algorithmic and computational benefits provided by the simplicity of the data structures forone-dimensional case.

NOTE. You can use the NULL task pointer in calls to execution routines. In this case,the routine will be terminated and no system crash will occur.

If the task is executed successfully, the execution routine returns zero status code. If an erroris detected, the execution routine returns an error code which signals that a specific error hasoccurred. In particular, an error status code is returned in the following cases:

• if the task pointer is NULL,

• if the task descriptor is corrupted,

• if calculation has failed for some other reason.

NOTE. Intel® MKL does not control floating-point erros, like overflow or gradual underflow,or operations with NaNs, etc.

If an error occurs, the task descriptor stores the error code.

The table below lists all task execution routines.

Table 10-18 Task Execution Routines

DescriptionRoutine

Computes convolution or correlation for a multidimensional case.Exec

Computes convolution or correlation for a one-dimensional case.Exec1D

Computes convolution or correlation as X-form for a multidimensional case.ExecX

Computes convolution or correlation as X-form for a one-dimensional case.ExecX1D

2422


ExecComputes convolution or correlation formultidimensional case.

Syntax

Fortran:

status = vslsconvexec(task, x, xstride, y, ystride, z, zstride)

status = vsldconvexec(task, x, xstride, y, ystride, z, zstride)

status = vslscorrexec(task, x, xstride, y, ystride, z, zstride)

status = vsldcorrexec(task, x, xstride, y, ystride, z, zstride)

C:

status = vslsConvExec(task, x, xstride, y, ystride, z, zstride);

status = vsldConvExec(task, x, xstride, y, ystride, z, zstride);

status = vslsCorrExec(task, x, xstride, y, ystride, z, zstride);

status = vsldCorrExec(task, x, xstride, y, ystride, z, zstride);

Description

Each of the Exec routines computes convolution or correlation of the data provided by thearrays x and y and then stores the results in the array z. Parameters of the operation are readfrom the task descriptor created previously by a corresponding NewTask constructor and pointedto by task. If task is NULL, no operation is done.

The stride parameters xstride, ystride, and zstride specify the physical location of theinput and output data in the arrays x, y, and z, respectively. In a one-dimensional case, strideis an interval between locations of consecutive elements of the array. For example, if the valueof the parameter zstride is s, then only every sth element of the array z will be used to storethe output data. The stride value must be positive or negative but not zero.

2423


Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvexec andvsldconvexec

task

TYPE(VSL_CORR_TASK) forvslscorrexec andvsldcorrexec

C: VSLConvTaskPtr forvslsConvExec andvsldConvExec

VSLCorrTaskPtr forvslsCorrExec andvsldCorrExec

Pointers to arrays containing input data. See DataAllocation for more information.

FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexec andvslscorrexec

x , y

REAL(KIND=8),DIMENSION(*) forvsldconvexec andvsldcorrexec

C: float[] forvslsConvExec andvslsCorrExec

double[] for vsldConvExecand vsldCorrExec

Strides for input and output data. For moreinformation, see Data Allocation.


C: int[]

xstride,ystride,zstride

2424


Output Parameters

DescriptionTypeName

Pointer to the array that stores output data. See DataAllocation for more information.

FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexec andvslscorrexec

z

REAL(KIND=8),DIMENSION(*) forvsldconvexec andvsldcorrexec

C: float[] forvslsConvExec andvslsCorrExec

double[] for vsldConvExecand vsldCorrExec

Set to VSL_STATUS_OK if the task is executedsuccessfully or set to non-zero error code otherwise.

FORTRAN: INTEGER

C: int

status

Exec1DComputes convolution or correlation forone-dimensional case.

Syntax

Fortran:

status = vslsconvexec1d(task, x, xstride, y, ystride, z, zstride)

status = vsldconvexec1d(task, x, xstride, y, ystride, z, zstride)

status = vslscorrexec1d(task, x, xstride, y, ystride, z, zstride)

status = vsldcorrexec1d(task, x, xstride, y, ystride, z, zstride)

2425


C:

status = vslsConvExec1D(task, x, xstride, y, ystride, z, zstride);

status = vsldConvExec1D(task, x, xstride, y, ystride, z, zstride);

status = vslsCorrExec1D(task, x, xstride, y, ystride, z, zstride);

status = vsldCorrExec1D(task, x, xstride, y, ystride, z, zstride);

Description

Each of the Exec1D routines computes convolution or correlation of the data provided by thearrays x and y and then stores the results in the array z. These routines represent a specialone-dimensional version of the operation, assuming that the value of the parameter dims is1. Using this version of execution routines can help speed up performance in case ofone-dimensional data.

Parameters of the operation are read from the task descriptor created previously by acorresponding NewTask1D constructor and pointed to by task. If task is NULL, no operationis done.

Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvexec1d andvsldconvexec1d

task

TYPE(VSL_CORR_TASK) forvslscorrexec1d andvsldcorrexec1d

C: VSLConvTaskPtr forvslsConvExec1D andvsldConvExec1D

VSLCorrTaskPtr forvslsCorrExec1D andvsldCorrExec1D

2426


DescriptionTypeName

Pointers to arrays containing input data. See DataAllocation for more information.

FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexec1d andvslscorrexec1d

x, y

REAL(KIND=8),DIMENSION(*) forvsldconvexec1d andvsldcorrexec1d

C: float[] forvslsConvExec1D andvslsCorrExec1D

double[] for vsldConvExe1Dand vsldCorrExec1D

Strides for input and output data. For moreinformation, see stride parameters.

FORTRAN: INTEGER

C: int

xstride,ystride,zstride

Output Parameters

DescriptionTypeName


FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexec1d andvslscorrexec1d

z

REAL(KIND=8),DIMENSION(*) forvsldconvexec1d andvsldcorrexec1d

C: float[] forvslsConvExec1D andvslsCorrExec1D

2427


DescriptionTypeName

double[] forvsldConvExec1D andvsldCorrExec1D


FORTRAN: INTEGER

C: int

status

ExecXComputes convolution or correlation formultidimensional case with the fixed first operandvector.

Syntax

Fortran:

status = vslsconvexecx(task, y, ystride, z, zstride)

status = vsldconvexecx(task, y, ystride, z, zstride)

status = vslscorrexecx(task, y, ystride, z, zstride)

status = vsldcorrexecx(task, y, ystride, z, zstride)

C:

status = vslsConvExecX(task, y, ystride, z, zstride);

status = vsldConvExecX(task, y, ystride, z, zstride);

status = vslsCorrExecX(task, y, ystride, z, zstride);

status = vsldCorrExecX(task, y, ystride, z, zstride);

Description

Each of the ExecX routines computes convolution or correlation of the data provided by thearrays x and y and then stores the results in the array z. These routines represent a specialversion of the operation, which assumes that the first operand vector was set on the taskconstruction stage and the task object keeps the pointer to the array x.

2428


Parameters of the operation are read from the task descriptor created previously by acorresponding NewTaskX constructor and pointed to by task. If task is NULL, no operation isdone.

Using this form of execution routines is recommended when you need to compute multipleconvolutions or correlations with the same data vector in array x against different vectors inarray y. This helps improve performance by eliminating unnecessary overhead in repeatedcomputation of intermediate data required for the operation.

Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvexecx andvsldconvexecx

task

TYPE(VSL_CORR_TASK) forvslscorrexecx andvsldcorrexecx

C: VSLConvTaskPtr forvslsConvExecX andvsldConvExecX

VSLCorrTaskPtr forvslsCorrExecX andvsldCorrExecX

Pointer to array containing input data (for the secondoperand vector). See Data Allocation for moreinformation.

FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexecx andvslscorrexecx

x ,y

REAL(KIND=8),DIMENSION(*) forvsldconvexecx andvsldcorrexecx

C: float[] forvslsConvExecX andvslsCorrExecX

2429


DescriptionTypeName

double[] for vsldConvExeXand vsldCorrExecX



C: int[]

ystride,zstride

Output Parameters

DescriptionTypeName


FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexecx andvslscorrexecx

z

REAL(KIND=8),DIMENSION(*) forvsldconvexecx andvsldcorrexecx

C: float[] forvslsConvExecX andvslsCorrExecX

double[] for vsldConvExecXand vsldCorrExecX


FORTRAN: INTEGER

C: int

status

2430


ExecX1DComputes convolution or correlation forone-dimensional case with the fixed first operandvector.

Syntax

Fortran:

status = vslsconvexecx1d(task, y, ystride, z, zstride)

status = vsldconvexecx1d(task, y, ystride, z, zstride)

status = vslscorrexecx1d(task, y, ystride, z, zstride)

status = vsldcorrexecx1d(task, y, ystride, z, zstride)

C:

status = vslsConvExecX1D(task, y, ystride, z, zstride);

status = vsldConvExecX1D(task, y, ystride, z, zstride);

status = vslsCorrExecX1D(task, y, ystride, z, zstride);

status = vsldCorrExecX1D(task, y, ystride, z, zstride);

Description

Each of the ExecX1D routines computes convolution or correlation of one-dimensional (assumingthat dims =1)data provided by the arrays x and y and then stores the results in the array z.These routines represent a special version of the operation, which expects that the first operandvector was set on the task construction stage.

Parameters of the operation are read from the task descriptor created previously by acorresponding NewTaskX1D constructor and pointed to by task. If task is NULL, no operationis done.

Using this form of execution routines is recommended when you need to compute multipleone-dimensional convolutions or correlations with the same data vector in array x againstdifferent vectors in array y. This helps improve performance by eliminating unnecessary overheadin repeated computation of intermediate data required for the operation.

2431


Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslsconvexecx1d andvsldconvexecx1d

task

TYPE(VSL_CORR_TASK) forvslscorrexecx1d andvsldcorrexecx1d

C: VSLConvTaskPtr forvslsConvExecX1D andvsldConvExecX1D

VSLCorrTaskPtr forvslsCorrExecX1D andvsldCorrExecX1D

Pointer to array containing input data (for the secondoperand vector). See Data Allocation for moreinformation.

FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexecx1d andvslscorrexecx1d

x , y

REAL(KIND=8),DIMENSION(*) forvsldconvexecx1d andvsldcorrexecx1d

C: float[] forvslsConvExecX1D andvslsCorrExecX1D

double[] forvsldConvExeX1D andvsldCorrExecX1D


FORTRAN: INTEGER

C: int

ystride,zstride

2432


Output Parameters

DescriptionTypeName


FORTRAN: REAL(KIND=4),DIMENSION(*) forvslsconvexecx1d andvslscorrexecx1d

z

REAL(KIND=8),DIMENSION(*) forvsldconvexecx1d andvsldcorrexec1d

C: float[] forvslsConvExecX1D andvslsCorrExecX1D

double[] forvsldConvExecX1D andvsldCorrExecX1D


FORTRAN: INTEGER

C: int

status

Task Destructors

Task destructors are routines designed for deleting task objects and deallocating memory.

DeleteTaskDestroys the task object and frees the memory.

Syntax

Fortran:

errcode = vslconvdeletetask(task)

errcode = vslcorrdeletetask(task)

2433


C:

errcode = vslConvDeleteTask(task);

errcode = vslCorrDeleteTask(task);

Description

Given a pointer to a task descriptor, this routine deletes the task descriptor object and freesthe memory allocated for the data structure. If the task holds a work memory, the latter is alsofreed. The task pointer is set to NULL.

Note that if by some reason the task was not deleted successfully, the routine returns an errorcode. This error code has no relation to the task status code and does not change it.

NOTE. You can use the NULL task pointer in calls to destructor routines. In this case,the routine will be terminated and no system crash will occur.

Input Parameters

DescriptionTypeName

Pointer to the task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvdeletetask

task

TYPE(VSL_CORR_TASK) forvslcorrdeletetask

C: VSLConvTaskPtr* forvslConvDeleteTask

VSLCorrTaskPtr* forvslCorrDeleteTask

Output Parameters

DescriptionTypeName

Contains 0 if the task object is deleted successfully.Contains error code if an error occurred.

FORTRAN: INTEGER

C: int

errcode

2434


Task Copy

The routines are designed for copying convolution and correlation task descriptors.

CopyTaskCopies a descriptor for convolution or correlationtask.

Syntax

Fortran:

status = vslconvcopytask(newtask, srctask)

status = vslcorrcopytask(newtask, srctask)

C:

status = vslConvCopyTask(newtask, srctask);

status = vslCorrCopyTask(newtask, srctask);

Description

If a task object srctask already exists, you can use an appropriate CopyTask routine to makeits copy in newtask. After the copy operation, both source and new task objects will becomecommitted (see Overview for information about task commitment). If the source task was notpreviously committed, the commitment operation for this task is implicitly invoked beforecopying starts. If an error occurs during source task commitment, the task stores the errorcode in the status field. If an error occurs during copy operation, the routine returns a NULLpointer instead of a reference to a new task object.

Input Parameters

DescriptionTypeName

Pointer to the source task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvcopytask

srctask

TYPE(VSL_CORR_TASK) forvslcorrcopytask

2435


DescriptionTypeName

C: VSLConvTaskPtr forvslConvCopyTask

VSLCorrTaskPtr forvslCorrCopyTask

Output Parameters

DescriptionTypeName

Pointer to the new task descriptor.FORTRAN:TYPE(VSL_CONV_TASK) forvslconvcopytask

newtask

TYPE(VSL_CORR_TASK) forvslcorrcopytask

C: VSLConvTaskPtr* forvslConvCopyTask

VSLCorrTaskPtr* forvslCorrCopyTask

Current status of the source task.FORTRAN: INTEGER

C: int

status

Usage Examples

This section demonstrates how you can use the Intel MKL routines to perform some commonconvolution and correlation operations both for single threaded and multiple threadedcalculations. The following two sample functions scond1 and sconf1 simulate the convolutionand correlation functions SCOND and SCONF found in IBM ESSL* library. The functions assumesingle threaded calculations and can be used with C or C++ compilers.

2436


Example 10-5 Function scond1 for Single Threaded Calculations#include "mkl_vsl.h"

int scond1(

float h[], int inch,

float x[], int incx,

float y[], int incy,

int nh, int nx, int iy0, int ny)

{

int status;

VSLConvTaskPtr task;

vslsConvNewTask1D(&task,VSL_CONV_MODE_DIRECT,nh,nx,ny);

vslConvSetStart(task, &iy0);

status = vslsConvExec1D(task, h,inch, x,incx, y,incy);

vslConvDeleteTask(&task);

return status;

}

Example 10-6 Function sconf1 for Single Threaded Calculations#include "mkl_vsl.h"

int sconf1(

int init,

float h[], int inc1h,

float x[], int inc1x, int inc2x,

float y[], int inc1y, int inc2y,

int nh, int nx, int m, int iy0, int ny,

void* aux1, int naux1, void* aux2, int naux2)

{

int status;

/* assume that aux1!=0 and naux1 is big enough */

2437


VSLConvTaskPtr* task = (VSLConvTaskPtr*)aux1;

if (init != 0)

/* initialization: */

status = vslsConvNewTaskX1D(task,VSL_CONV_MODE_FFT,

nh,nx,ny, h,inc1h);

if (init == 0) {

/* calculations: */

int i;

vslConvSetStart(*task, &iy0);

for (i=0; i<m; i++) {

float* xi = &x[inc2x * i];

float* yi = &y[inc2y * i];

/* task is implicitly committed at i==0 */

status = vslsConvExecX1D(*task, xi, inc1x, yi, inc1y);

};

};

vslConvDeleteTask(task);

return status;

}

Using Multiple Threads

For functions such as sconf1 described in the previous example, parallel calculations may bemore preferable instead of cycling. If m>1, you can use multiple threads for invoking the taskexecution against different data sequences. For such cases, use task copy routines to create mcopies of the task object before the calculations stage and then run these copies with differentthreads. Ensure that you make all necessary parameter adjustments for the task (using TaskEditors) before copying it.

2438

The sample code for that can look like following:

if (init == 0) {

int i, status, ss[M];

VSLConvTaskPtr tasks[M];

/* assume that M is big enough */

. . .

vslConvSetStart(*task, &iy0);

. . .

for (i=0; i<m; i++)

/* implicit commitment at i==0 */

vslConvCopyTask(&tasks[i],*task);

. . .

Then, m threads may be started to execute different copies of the task:

. . .

float* xi = &x[inc2x * i];

float* yi = &y[inc2y * i];

ss[i]=vslsConvExecX1D(tasks[i], xi,inc1x, yi,inc1y);

. . .

And finally, after all threads have finished the calculations, overall status ought to be collectedfrom all task objects. The following code assumes signaling the first error found, if any:

. . .

for (i=0; i<m; i++) {

status = ss[i];

if (status != 0) /* 0 means "OK" */

break;

};

return status;

}; /* end if init==0 */

2439

Execution routines modify the task internal state (fields of the task structure). Such modificationsmay conflict with each other if different threads work with the same task object simultaneously.This is the reason why different threads must use different copies of the task.

Mathematical Notation and Definitions

The following notation is necessary to explain the underlying mathematical definitions used inthe text:

The set of real numbers.R = (-∞, +∞)The set of integer numbers.Z = {0, ±1, ±2, ...}The set of N-dimensional series of integer numbers.ZN = Z× ... ×ZN-dimensional series of integers.p = (p1, ..., pN) ∈ ZN

Function u with arguments from ZN and values from R.u:ZN→RThe value of the function u for the argument (p1, ..., pN).u(p) = u(p1, ..., pN)Function w is the convolution of the functions u, v.w = u*vFunction w is the correlation of the functions u, v.w = u•v

Given series p, q ∈ ZN:

• series r = p + q is defined as rn = pn + qn for every n=1,...,N

• series r = p - q is defined as rn = pn - qn for every n=1,...,N

• series r = sup{p, q} is defines as rn = max{pn, qn} for every n=1,...,N

• series r = inf{p, q} is defined as rn = min{pn, qn} for every n=1,...,N

• inequality p ≤ q means that pn ≤ qn for every n=1,...,N.

A function u(p) is called a finite function if there exist series Pmin, Pmax ∈ ZN such that:

u(p) ≠ 0

implies

Pmin ≤ p ≤ Pmax.

Operations of convolution and correlation are only defined for finite functions.

Consider functions u, v and series Pmin, PmaxQmin, Qmax ∈ ZN such that:

u(p) ≠ 0 implies Pmin ≤ p ≤ Pmax.

2440


v(q) ≠ 0 implies Qmin ≤ q ≤ Qmax.

Definitions of linear correlation and linear convolution for functions u and v are given below.

Linear Convolution

If function w = u*v is the convolution of u and v, then:

w(r) ≠ 0 implies Rmin ≤ r ≤ Rmax,

where Rmin = Pmin + Qmin and Rmax = Pmax + Qmax.

If Rmin ≤ r ≤ Rmax, then:

w(r) = ∑u(t)·v(r−t) is the sum for all t ∈ ZN such that Tmin ≤ t ≤ Tmax,

where Tmin = sup{Pmin, r − Qmax} and Tmax = inf{Pmax, r − Qmin}.

Linear Correlation

If function w = u • v is the correlation of u and v, then:

w(r) ≠ 0 implies Rmin ≤ r ≤ Rmax,

where Rmin = Qmin - Pmax and Rmax = Qmax - Pmin.

If Rmin ≤ r ≤ Rmax, then:

w(r) = ∑u(t)·v(r+t) is the sum for all t ∈ ZN such that Tmin ≤ t ≤ Tmax,

where Tmin = sup{Pmin, Qmin − r} and Tmax = inf{Pmax, Qmax − r}.

Representation of the functions u, v, w as the input/output data for the Intel MKL convolutionand correlation functions is described in the Data Allocation section below.

Data Allocation

This section explains the relation between:

• mathematical finite functions u, v, w introduced in the section Mathematical Notation andDefinitions;

2441


• multi-dimensional input and output data vectors representing the functions u, v, w;

• arrays u, v, w used to store the input and output data vectors in computer memory

The convolution and correlation routine parameters that determine the allocation of input andoutput data are the following:

• Data arrays x, y, z

• Shape arrays xshape, yshape, zshape

• Strides within arrays xstride, ystride, zstride

• Parameters start, decimation

Finite Functions and Data Vectors

The finite functions u(p), v(q), and w(r) introduced above are represented as multi-dimensionalvectors of input and output data:

inputu(i1,...,idims) for u(p1,...,pN)

inputv(j1,...,jdims) for v(q1,...,qN)

output(k1,...,kdims) for w(r1,...,rN).

Parameter dims represents the number of dimensions and is equal to N.

The parameters xshape, yshape, and zshape define the shapes of input/output vectors:

inputu(i1,...,idims) is defined if 1 ≤ in ≤ xshape(n) for every n=1,...,dims

inputv(j1,...,jdims) is defined if 1 ≤ jn ≤ yshape(n) for every n=1,...,dims

output(k1,...,kdims) is defined if 1 ≤ kn ≤ zshape(n) for every n=1,...,dims.

Relation between the input vectors and the functions u and v is defined by the following formulas:

inputu(i1,...,idims)= u(p1,...,pN), where pn = Pnmin + (in-1) for every n

inputv(j1,...,jdims)= v(q1,...,qN), where qn=Qnmin + (jn-1) for every n.

Relation between the output vector and the function w(r) is similar (but only in the case whenparameters start and decimation are not defined):

output(k1,...,kdims)= w(r1,...,rN), where rn=Rnmin + (kn-1) for every n.

If the parameter start is defined, it must belong to the interval Rnmin ≤ start(n) ≤ Rn

max.If defined, the start parameter replaces Rmin in the formula:

2442


output(k1,...,kdims)=w(r1,...,rN), where rn=start(n) + (kn-1)

If the parameter decimation is defined, it changes the relation according to the followingformula:

output(k1,...,kdims)=w(r1,...,rN), where rn= Rnmin + (kn-1)*decimation(n)

If both parameters start and decimation are defined, the formula is as follows:

output(k1,...,kdims)=w(r1,...,rN), where rn=start(n) + (kn-1)*decimation(n)

The convolution and correlation software checks the values of zshape, start, and decimationduring task commitment. If rn exceeds Rn

max for some kn,n=1,...,dims, an error is raised.

Allocation of Data Vectors

Both parameter arrays x and y contain input data vectors in memory, while array z is intendedfor storing output data vector. To access the memory, the convolution and correlation softwareuses only pointers to these arrays and ignores the array shapes.

For parameters x, y, and z, you can provide one-dimensional arrays with the requirement thatactual length of these arrays be sufficient to store the data vectors.

The allocation of the input and output data inside the arrays x, y, and z is described below

assuming that the arrays are one-dimensional. Given multi-dimensional indices i, j, k ∈ ZN,

one-dimensional indices e, f, g ∈ Z are defined such that:

inputu(i1,...,idims) is allocated at x(e)

inputv(j1,...,jdims) is allocated at y(f)

output(k1,...,kdims) is allocated at z(g).

The indices e, f, and g are defined as follows:

e = 1 + ∑xstride(n)·dx(n) (the sum is for all n=1,...,dims)

f = 1 + ∑ystride(n)·dy(n) (the sum is for all n=1,...,dims)

g = 1 + ∑zstride(n)·dz(n) (the sum is for all n=1,...,dims)

The distances dx(n), dy(n), and dz(n) depend on the signum of the stride:

dx(n) = in-1 if xstride(n)>0, or dx(n) = in-xshape(n) if xstride(n)<0

dy(n) = jn-1 if ystride(n)>0, or dy(n) = jn-yshape(n) if ystride(n)<0

2443

dz(n) = kn-1 if zstride(n)>0, or dz(n) = kn-zshape(n) if zstride(n)<0

The definitions of indices e, f, and g assume that indexes for arrays x, y, and z are startedfrom unity:

x(e) is defined for e=1,...,length(x)

y(f) is defined for f=1,...,length(y)

z(g) is defined for g=1,...,length(z)

Below is a detailed explanation about how elements of the multi-dimensional output vector arestored in the array z for one-dimensional and two-dimensional cases.

One-dimensional case. If dims=1, then zshape is the number of the output values to bestored in the array z. The actual length of array z may be greater than zshape elements.

If zstride>1, output values are stored with the stride: output(1) is stored to z(1), output(2)is stored to z(1+zstride), and so on. Hence, the actual length of z must be at least1+zstride*(zshape-1) elements or more.

If zstride<0, it still defines the stride between elements of array z. However, the order of theused elements is the opposite. For the k-th output value, output(k) is stored inz(1+|zstride|*(zshape-k)), where |zstride| is the absolute value of zstride. The actuallength of the array z must be at least 1+|zstride|*(zshape - 1) elements.

Two-dimensional case. If dims=2, the output data is a two-dimensional matrix. The valuezstride(1) defines the stride inside matrix columns, that is, the stride between the output(k1,k2) and output(k1+1, k2) for every pair of indices k1, k2. On the other hand, zstride(2)defines the stride between columns, that is, the stride between output(k1,k2) andoutput(k1,k2+1).

If zstride(2) is greater than zshape(1), this causes sparse allocation of columns. If thevalue of zstride(2) is smaller than zshape(1), this may result in the transposition of theoutput matrix. For example, if zshape = (2,3), you can define zstride = (3,1) to allocateoutput values like transposed matrix of the shape 3x2.

Whether zstride assumes this kind of transformations or not, you need to ensure that differentelements output (k1, ...,kdims) will be stored in different locations z(g).

2444

11Fourier Transform Functions

This chapter describes the following implementations of Discrete Fourier transform functions available inIntel® MKL:

• Discrete Fourier transform (DFT) functions for single-processor or shared-memory systems (see DFTFunctions below)

• Cluster DFT Functions for distributed-memory architectures (available with Intel® MKL Cluster Editiononly).

Both these groups of DFT functions present a uniform and easy-to-use Applications Programmer Interfaceproviding fast computation of DFT via the Fast Fourier Transform (FFT) algorithm.

NOTE. DFT functions support arbitrary lengths.

These routines offer broad functionality and high performance not only for radix 2 but alsofor 3, 5, 7, 11, and other radices.

DFT FunctionsThe Discrete Fourier Transform function library of Intel MKL provides one-dimensional,two-dimensional, and multi-dimensional (up to the order of 7) routines and both Fortran and Cinterfaces for all transform functions.

The full list of DFT functions implemented in Intel MKL is given in the table below:

Table 11-1 DFT Functions in Intel MKL

OperationFunction Name

Descriptor Manipulation Functions

Allocates memory for the descriptor data structure andinstantiates it with default configuration settings.

DftiCreateDescriptor

Performs all initialization that facilitates the actual DFTcomputation.

DftiCommitDescriptor

Copies an existing descriptor.DftiCopyDescriptor

2445


Frees memory allocated for a descriptor.DftiFreeDescriptor

DFT Computation Functions

Computes the forward DFT.DftiComputeForward

Computes the backward DFT.DftiComputeBackward

Descriptor Configuration Functions

Sets one particular configuration parameter with the specifiedconfiguration value.

DftiSetValue

Gets the configuration value of one particular configurationparameter.

DftiGetValue

Status Checking Functions

Checks if the status reflects an error of a predefined class.DftiErrorClass

Generates an error message.DftiErrorMessage

Description of DFT functions is followed by discussion of configuration settings (see ConfigurationSettings) and various configuration parameters used.

Computing DFT

DFT functions described later in this chapter are implemented in Fortran and C interface. Fortranstands for Fortran 95. DFT interface relies critically on many modern features offered in Fortran95 that have no counterpart in Fortran 77

NOTE. Following the explicit function interface in Fortran, data array must be definedas one-dimensional for any transformation type.

The materials presented in this chapter assume the availability of native complex types in C asthey are specified in C9X.

You can find example code that uses DFT interface functions to compute transform results inFourier Transform Functions Code Examples section in the Appendix C.

2446


For most common situations, we expect a DFT computation can be effected by four functioncalls. The approach adopted in Intel MKL for DFT computation uses one single data structure,the descriptor, to record flexible configuration whose parameters can be changed independently.This results in enhanced functionality and ease of use.

The record of type DFTI_DESCRIPTOR, when created, contains information about the lengthand domain of the DFT to be computed, as well as the setting of a rather large number ofconfiguration parameters. The default settings for all of these parameters include, for example,the following:

• the DFT to be computed does not have a scale factor;

• there is only one set of data to be transformed;

• the data is stored contiguously in memory;

• the computed result overwrites (in place) the input data; etc.

Should any one of these many default settings be inappropriate, they can be changedone-at-a-time through the function DftiSetValue as illustrated in the Example C-20 andExample C-21.

DFT Interface

To use the DFT functions, you need to access the module MKL_DFTI through the "use" statementin Fortran; or access the header file mkl_dfti.h through "include" in C.

The Fortran interface provides a derived type DFTI_DESCRIPTOR; a number of named constantsrepresenting various names of configuration parameters and their possible values; and a numberof overloaded functions through the generic functionality of Fortran 95.

The C interface provides a structure type DFTI_DESCRIPTOR, a macro definition

#define DFTI_DESCRIPTOR_HANDLE DFTI_DESCRIPTOR *;

a number of named constants of two enumeration types DFTI_CONFIG_PARAM andDFTI_CONFIG_VALUE; and a number of functions, some of which accept different number ofinput arguments.

NOTE. Some of the DFT functions and/or functionality described in the subsequentsections of this chapter may not be supported by the currently available implementationof the library. You can find the complete list of the implementation-specific exceptionsin the release notes to your version of the library.

There are four main categories of DFT functions in Intel MKL:

2447

Fourier Transform Functions 11

1. Descriptor Manipulation. There are four functions in this category. The first one,DftiCreateDescriptor, creates a DFT descriptor whose storage is allocated dynamicallyby the routine. This function configures the descriptor with default settings correspondingto a few input values supplied by the user.

The second, DftiCommitDescriptor, "commits" the descriptor to all its setting. In practice,this usually means that all the necessary precomputation will be performed. This may includefactorization of the input length and computation of all the required twiddle factors. Thethird function, DftiCopyDescriptor, makes an extra copy of a descriptor, and the fourthfunction, DftiFreeDescriptor, frees up all the memory allocated for the descriptorinformation.

2. DFT Computation. There are two functions in this category. The first, DftiComputeForward,effects a forward DFT computation, and the second function, DftiComputeBackward,performs a backward DFT computation.

3. Descriptor configuration. There are two functions in this category. One function,DftiSetValue, sets one specific value to one of the many configuration parameters thatare changeable (a few are not); the other, DftiGetValue, gets the current value of anyone of these configuration parameters (all are readable). These parameters, though many,are handled one-at-a-time.

4. Status Checking. The functions described in the three categories above return an integervalue denoting the status of the operation. In particular, a non-zero return value alwaysindicates a problem of some sort. Envisioned to be further enhanced in later releases ofIntel MKL, DFT interface at present provides for one logical status class function,DftiErrorClass, and a simple status message generation function, DftiErrorMessage.

Status Checking Functions

All of the descriptor manipulation, DFT computation, and descriptor configuration functionsreturn an integer value denoting the status of the operation. Two functions serve to check thestatus. The first function is a logical function that checks if the status reflects an error of apredefined class, and the second is an error message function that returns a character string.

2448


ErrorClassChecks if the status reflects an error of apredefined class.

Syntax

Fortran:

Predicate = DftiErrorClass( Status, Error_Class )

C:

predicate = DftiErrorClass( status, error_class );

Description

DFT interface in Intel MKL provides a set of predefined error class listed in Table 11-2. Theseare named constants and have the type INTEGER in Fortran and long in C.

Table 11-2 Predefined Error Class

CommentsNamed Constants

No errorDFTI_NO_ERROR

Usually associated with memory allocationDFTI_MEMORY_ERROR

Invalid settings of one or more configuration parametersDFTI_INVALID_CONFIGURATION

Inconsistent configuration or input parametersDFTI_INCONSISTENT_CONFIGURATION

Number of OMP threads in the computation function is notequal to the number of OMP threads in the initialization stage(commit function)

DFTI_NUMBER_OF_THREADS_ERROR

Usually associated with OMP routine's error return valueDFTI_MULTITHREADED_ERROR

Descriptor is unusable for computationDFTI_BAD_DESCRIPTOR

Unimplemented legitimate settings; implementation dependentDFTI_UNIMPLEMENTED

Internal library errorDFTI_MKL_INTERNAL_ERROR

2449



Length of one of dimensions exceeds 232 -1 (4 bytes).DFTI_1D_LENGTH_EXCEEDS_INT32

Note that the correct usage is to check if the status returns .TRUE. or .FALSE. through theuse of DftiErrorClass with a specific error class. Direct comparison of a status with thepredefined class is an incorrect usage. see Example C-22 on a correct use of the status checkingfunctions.

Interface and Prototype//Fortran interface

INTERFACE DftiErrorClass

//Note that the body provided here is to illustrate the different

//argument list and types of dummy arguments. The interface

//does not guarantee what the actual function names are.

//Users can only rely on the function name following the

//keyword INTERFACE

FUNCTION some_actual_function_8( Status, Error_Class )

LOGICAL some_actual_function_8

INTEGER, INTENT(IN) :: Status, Error_Class

END FUNCTION some_actual_function_8

END INTERFACE DftiErrorClass

/* C prototype */

long DftiErrorClass( long , long );

2450


ErrorMessageGenerates an error message.

Syntax

Fortran:

ERROR_MESSAGE = DftiErrorMessage( Status )

C:

error_message = DftiErrorMessage( status );

Description

The error message function generates an error message character string. The maximum lengthof the string in Fortran is given by the named constant DFTI_MAX_MESSAGE_LENGTH. The actualerror message is implementation dependent. In Fortran, the user needs to use a characterstring of length DFTI_MAX_MESSAGE_LENGTH as the target. In C, the function returns a pointerto a character string, that is, a character array with the delimiter ' 0'.

Example C-22 shows how this function can be implemented.

Interface and Prototype//Fortran interface

INTERFACE DftiErrorMessage





//keyword INTERFACE

FUNCTION some_actual_function_9( Status, Error_Class )

CHARACTER(LEN=DFTI_MAX_MESSAGE_LENGTH) some_actual_function_9( Status )

INTEGER, INTENT(IN) :: Status


END INTERFACE DftiErrorMessage

2451


/* C prototype */

char *DftiErrorMessage( long );


There are four functions in this category: create a descriptor, commit a descriptor, copy adescriptor, and free a descriptor.

CreateDescriptorAllocates memory for the descriptor data structureand instantiates it with default configurationsettings.

Syntax

Fortran:

Status = DftiCreateDescriptor( Desc_Handle, &

Precision, &

Forward_Domain, &

Dimension, &

Length )

C:

status = DftiCreateDescriptor( &desc_handle, precision, forward_domain,dimension, length );

Description

This function allocates memory for the descriptor data structure and instantiates it with all thedefault configuration settings with respect to the precision, domain, dimension, and length ofthe desired transform. The domain is understood to be the domain of the forward transform.Since memory is allocated dynamically, the result is actually a pointer to the created descriptor.This function is slightly different from the "initialization" routine in more traditional softwarepackages or libraries used for computing DFT. In all likelihood, this function will not perform

2452


any significant computation work such as twiddle factors computation, as the defaultconfiguration settings can still be changed upon user's request through the value setting functionDftiSetValue.

The precision and (forward) domain are specified through named constants provided in DFTinterface for the configuration values. The choices for precision are DFTI_SINGLE andDFTI_DOUBLE; and the choices for (forward) domain are DFTI_COMPLEX and DFTI_REAL. SeeTable 11-5 for the complete table of named constants for configuration values.

Dimension is a simple positive integer indicating the dimension of the transform. Length iseither a simple positive integer for one-dimensional transform, or an integer array (pointer inC) containing the positive integers corresponding to the lengths dimensions for multi-dimensionaltransform.

The function returns DFTI_NO_ERROR when completes successfully. See Status CheckingFunctions for more information on returned status.

Interface and Prototype!Fortran interface.

INTERFACE DftiCreateDescriptor

!Note that the body provided here is to illustrate the different

!argument list and types of dummy arguments. The interface

!does not guarantee what the actual function names are.

!Users can only rely on the function name following the keyword INTERFACE

FUNCTION some_actual_function_1D( Desc_Handle, Prec, Dom, Dim, Length )

INTEGER :: some_actual_function_1D

TYPE(DFTI_DESCRIPTOR), POINTER :: Desc_Handle

INTEGER, INTENT(IN) :: Prec, Dom

INTEGER, INTENT(IN) :: Dim, Length

END FUNCTION some_actual_function_1D

2453


FUNCTION some_actual_function_HIGHD( Desc_Handle, Prec, Dom, Dim, Length )

INTEGER :: some_actual_function_HIGHD


INTEGER, INTENT(IN) :: Prec, Dom

INTEGER, INTENT(IN) :: Dim, Length(*)

END FUNCTION some_actual_function_HIGHD

END INTERFACE DftiCreateDescriptor

Note that the function is overloaded as the actual argument for Length can be a scalar or a arank-one array.

/* C prototype */

long DftiCreateDescriptor( DFTI_DESCRIPTOR_HANDLE *,

DFTI_CONFIG_PARAM ,

DFTI_CONFIG_PARAM ,

long , ... );

The variable arguments facility is used to cope with the argument for lengths that can be ascalar (long), or an array (long *).

CommitDescriptorPerforms all initialization that facilitates the actualDFT computation.

Syntax

Fortran:

Status = DftiCommitDescriptor( Desc_Handle )

C:

status = DftiCommitDescriptor( desc_handle );

2454


Description

The interface requires a function that commits a previously created descriptor be invoked beforethe descriptor can be used for DFT computations. Typically, this committal performs allinitialization that facilitates the actual DFT computation. For a modern implementation, it mayinvolve exploring many different factorizations of the input length to search for highly efficientcomputation method.

Any changes of configuration parameters of a committed descriptor via the set value function(see Descriptor Configuration) requires a re-committal of the descriptor before a computationfunction can be invoked. Typically, this committal function call is immediately followed by acomputation function call (see DFT Computation).


Interface and Prototype! Fortran interface

INTERFACE DftiCommitDescriptor

!Note that the body provided here is to illustrate the different



!Users can only rely on the function name following the

!keyword INTERFACE

FUNCTION some_actual function_1 ( Desc_Handle )

INTEGER :: some_actual function_1


END FUNCTION some_actual function_1

END INTERFACE DftiCommitDescriptor

/* C prototype */

long DftiCommitDescriptor( DFTI_DESCRIPTOR_HANDLE );

2455


CopyDescriptorCopies an existing descriptor.

Syntax

Fortran:

Status = DftiCopyDescriptor( Desc_Handle_Original, Desc_Handle_Copy )

C:

status = DftiCopyDescriptor( desc_handle_original, &desc_handle_copy );

Description

This function makes a copy of an existing descriptor and provides a pointer to it. The purposeis that all information of the original descriptor will be maintained even if the original is destroyedvia the free descriptor function DftiFreeDescriptor.



INTERFACE DftiCopyDescriptor

! Note that the body provided here is to illustrate the different



!Users can only rely on the function name following the

!keyword INTERFACE

FUNCTION some_actual_function_2( Desc_Handle_Original,

Desc_Handle_Copy )

INTEGER :: some_actual_function_2

TYPE(DFTI_DESCRIPTOR), POINTER :: Desc_Handle_Original, Desc_Handle_Copy


END INTERFACE DftiCopyDescriptor

2456


/* C prototype */

long DftiCopyDescriptor( DFTI_DESCRIPTOR_HANLDE, DFTI_DESCRIPTOR_HANDLE *);

FreeDescriptorFrees memory allocated for a descriptor.

Syntax

Fortran:

Status = DftiFreeDescriptor( Desc_Handle )

C:

status = DftiFreeDescriptor( &desc_handle );

Description

This function frees up all memory space allocated for a descriptor.

NOTE. Memory allocation/deallocation inside Intel MKL is managed by Intel MKL MemoryManager. So, even after successful completion of FreeDescriptor, the memory spacemay continue being allocated for the application because the Memory Manager sometimesdoesn’t return the memory space to OS but considers the space free and can reuse itfor future memory allocation. See Example “MKL_FreeBuffers Usage with DFT Functions”in the description of the service function FreeBuffers on how to use Intel MKL MemoryManager and actually release memory.


2457



INTERFACE DftiFreeDescriptor





//keyword INTERFACE

FUNCTION some_actual_function_3( Desc_Handle )

INTEGER :: some_actual_function_3



END INTERFACE DftiFreeDescriptor

/* C prototype */

long DftiFreeDescriptor( DFTI_DESCRIPTOR_HANDLE * );


There are two functions in this category: compute the forward transform, and compute thebackward transform.

ComputeForwardComputes the forward DFT.

Syntax

Fortran:

Status = DftiComputeForward( Desc_Handle, X_inout )

Status = DftiComputeForward( Desc_Handle, X_in, X_out )

2458


C:

status = DftiComputeForward( desc_handle, x_inout );

status = DftiComputeForward( desc_handle, x_in, x_out );

Description

As soon as a descriptor is configured and committed successfully, actual computation of DFTcan be performed. The DftiComputeForward function computes the forward DFT.

This is the transform using the factor e-i2π/n. Because of the flexibility in configuration, input

data can be represented in various ways as well as output result can be placed differently.Consequently, the number of input parameters as well as their type vary. This variation isaccommodated by the generic function facility of Fortran 95. Data and result parameters areall declared as assumed-size rank-1 array DIMENSION(0:*).


2459


Interface and Prototype//Fortran interface.

INTERFACE DftiComputeFoward





//keyword INTERFACE

// One argument single precision complex

FUNCTION some_actual_function_4_C( Desc_Handle, X )

INTEGER :: some_actual_function_4_C


COMPLEX, INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_4_C

// One argument double precision complex

FUNCTION some_actual_function_4_Z( Desc_Handle, X )

INTEGER :: some_actual_function_4_Z


COMPLEX (Kind((0D0,0D0))), INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_4_Z

// One argument single precision real

FUNCTION some_actual_function_4_R( Desc_Handle, X )

INTEGER :: some_actual_function_4_R


REAL, INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_4_R

// One argument double precision real

...

2460


// Two argument single precision complex

...

...

FUNCTION some_actual_function_4_CC( Desc_Handle, X_In, Y_Out )

INTEGER :: some_actual_function_4_CC


COMPLEX, INTENT(IN) :: X_In(*)

COMPLEX, INTENT(OUT) :: Y_Out(*)

END FUNCTION some_actual_function_4_CC

END INTERFACE DftiComputeFoward

/* C prototype */

long DftiComputeForward( DFTI_DESCRIPTOR_HANDLE,

void *,

... );

The implementations of DFT interface expect the data be treated as data stored linearly inmemory with a regular "stride" pattern (discussed more fully in Strides, see also [3]). Thefunction expects the starting address of the first element. Hence we use the assume-sizedeclaration in Fortran.

The descriptor by itself contains sufficient information to determine exactly how many argumentsand of what type should be present. The implementation could use this information to checkagainst possible input inconsistency.

2461


ComputeBackwardComputes the backward DFT.

Syntax

Fortran:

Status = DftiComputeBackward( Desc_Handle, X_inout )

Status = DftiComputeBackward( Desc_Handle, X_in, X_out )

C:

status = DftiComputeBackward( desc_handle, x_inout );

status = DftiComputeBackward( desc_handle, x_in, x_out );

Description

As soon as a descriptor is configured and committed successfully, actual computation of DFTcan be performed. The DftiComputeBackward function computes the backward DFT.

This is the transform using the factor ei2π/n. Because of the flexibility in configuration, input

data can be represented in various ways as well as output result can be placed differently.Consequently, the number of input parameters as well as their type vary. This variation isaccommodated by the generic function facility of Fortran 95. Data and result parameters areall declared as assumed-size rank-1 array DIMENSION(0:*). The function returnsDFTI_NO_ERROR when completes successfully. See Status Checking Functions for moreinformation on returned status.

2462


Interface and Prototype

INTERFACE DftiComputeBackward





//keyword INTERFACE

// One argument single precision complex

FUNCTION some_actual_function_5_C( Desc_Handle, X )

INTEGER :: some_actual_function_5_C


COMPLEX, INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_5_C

// One argument double precision complex

FUNCTION some_actual_function_5_Z( Desc_Handle, X )

INTEGER :: some_actual_function_5_Z


COMPLEX (Kind((0D0,0D0))), INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_5_Z

// One argument single precision real

FUNCTION some_actual_function_5_R( Desc_Handle, X )

INTEGER :: some_actual_function_5_R


REAL, INTENT(INOUT) :: X(*)

END FUNCTION some_actual_function_5_R

// One argument double precision real

...

// Two argument single precision complex

2463


...

...

FUNCTION some_actual_function_5_CC( Desc_Handle, X_In, Y_Out )

INTEGER :: some_actual_function_5_CC


COMPLEX, INTENT(IN) :: X_In(*)

COMPLEX, INTENT(OUT) :: Y_Out(*)

END FUNCTION some_actual_function_5_CCEND INTERFACE DftiComputeBackward ENDINTERFACE DftiComputeBackward

/* C prototype */

long DftiComputeBackward( DFTI_DESCRIPTOR_HANDLE,

void *,

... );

The implementations of DFT interface expect the data be treated as data stored linearly inmemory with a regular "stride" pattern (discussed more fully in Strides, see also [3]). Thefunction expects the starting address of the first element. Hence we use the assume-sizedeclaration in Fortran.

The descriptor by itself contains sufficient information to determine exactly how many argumentsand of what type should be present. The implementation could use this information to checkagainst possible input inconsistency.


There are two functions in this category: the value setting function DftiSetValue sets oneparticular configuration parameter to an appropriate value, and the value getting functionDftiGetValue reads the values of one particular configuration parameter. While all configurationparameters are readable, a few of them cannot be set by user. Some of these contain fixedinformation of a particular implementation such as version number, or dynamic information,but nevertheless are derived by the implementation during execution of one of the functions.See Configuration Settings for details.

2464


SetValueSets one particular configuration parameter withthe specified configuration value.

Syntax

Fortran:

Status = DftiSetValue( Desc_Handle, &

Config_Param, &

Config_Val )

C:

status = DftiSetValue( desc_handle, config_param, config_val );

Description

This function sets one particular configuration parameter with the specified configuration value.The configuration parameter is one of the named constants listed in Table 11-3, and theconfiguration value is the corresponding appropriate type, which can be a named constant ora native type. See Configuration Settings for details of the meaning of the setting.


2465



INTERFACE DftiSetValue





//keyword INTERFACE

FUNCTION some_actual_function_6_INTVAL( Desc_Handle, Config_Param, INTVAL)

INTEGER :: some_actual_function_6_INTVAL

Type(DFTI_DESCRIPTOR), POINTER :: Desc_Handle

INTEGER, INTENT(IN) :: Config_Param

INTEGER, INTENT(IN) :: INTVAL

END FUNCTION some_actual_function_6_INTVAL

FUNCTION some_actual_function_6_SGLVAL( Desc_Handle, Config_Param, SGLVAL)

INTEGER :: some_actual_function_6_SGLVAL



REAL, INTENT(IN) :: SGLVAL

END FUNCTION some_actual_function_6_SGLVAL

2466


FUNCTION some_actual_function_6_DBLVAL( Desc_Handle, Config_Param, DBLVAL)

INTEGER :: some_actual_function_6_DBLVAL



REAL (KIND(0D0)), INTENT(IN) :: DBLVAL

END FUNCTION some_actual_function_6_DBLVAL

FUNCTION some_actual_function_6_INTVEC( Desc_Handle, Config_Param, INTVEC)

INTEGER :: some_actual_function_6_INTVEC



INTEGER, INTENT(IN) :: INTVEC(*)

END FUNCTION some_actual_function_6_INTVEC

FUNCTION some_actual_function_6_CHARS( Desc_Handle, Config_Param, CHARS )

INTEGER :: some_actual_function_6_CHARS



CHARCTER(*), INTENT(IN) :: CHARS

END FUNCTION some_actual_function_6_CHARS

END INTERFACE DftiSetValue

/* C prototype */

long DftiSetValue( DFTI_DESCRIPTOR_HANDLE, DFTI_CONFIG_PARAM , ... );

2467


GetValueGets the configuration value of one particularconfiguration parameter.

Syntax

Fortran:

Status = DftiGetValue( Desc_Handle, &

Config_Param, &

Config_Val )

C:

status = DftiGetValue( desc_handle, config_param, &config_val );

Description

This function gets the configuration value of one particular configuration parameter. Theconfiguration parameter is one of the named constants listed in Table 11-3 and Table 11-4,and the configuration value is the corresponding appropriate type, which can be a namedconstant or a native type.


2468



INTERFACE DftiGetValue





//keyword INTERFACE

FUNCTION some_actual_function_7_INTVAL( Desc_Handle, Config_Param, INTVAL)

INTEGER :: some_actual_function_7_INTVAL



INTEGER, INTENT(OUT) :: INTVAL

END FUNCTION DFTI_GET_VALUE_INTVAL

FUNCTION some_actual_function_7_SGLVAL( Desc_Handle, Config_Param, SGLVAL)

INTEGER :: some_actual_function_7_SGLVAL



REAL, INTENT(OUT) :: SGLVAL

END FUNCTION some_actual_function_7_SGLVAL

2469


FUNCTION some_actual_function_7_DBLVAL( Desc_Handle, Config_Param, DBLVAL)

INTEGER :: some_actual_function_7_DBLVAL



REAL (KIND(0D0)), INTENT(OUT) :: DBLVAL

END FUNCTION some_actual_function_7_DBLVAL

FUNCTION some_actual_function_7_INTVEC( Desc_Handle, Config_Param, INTVEC)

INTEGER :: some_actual_function_7_INTVEC



INTEGER, INTENT(OUT) :: INTVEC(*)

END FUNCTION some_actual_function_7_INTVEC

FUNCTION some_actual_function_7_INTPNT( Desc_Handle, Config_Param, INTPNT)

INTEGER :: some_actual_function_7_INTPNT



INTEGER, DIMENSION(*), POINTER :: INTPNT

END FUNCTION some_actual_function_7_INTPNT

FUNCTION some_actual_function_7_CHARS( Desc_Handle, Config_Param, CHARS )

INTEGER :: some_actual_function_7_CHARS



CHARCTER(*), INTENT(OUT):: CHARS

END FUNCTION some_actual_function_7_CHARS

END INTERFACE DftiGetValue

2470


/* C prototype */

long DftiGetValue( DFTI_DESCRIPTOR_HANDLE,

DFTI_CONFIG_PARAM ,

... );

Configuration Settings

Each of the configuration parameters is identified by a named constant in the MKL_DFTI module.In C, these named constants have the enumeration type DFTI_CONFIG_PARAM. The list ofconfiguration parameters whose values can be set by user is given in Table 11-3; the list ofconfiguration parameters that are read-only is given in Table 11-4. All parameters are readable.Most of these parameters are self-explanatory, while some others are discussed more fully inthe description of the relevant functions.

Table 11-3 Settable Configuration Parameters

CommentsValue TypeNamed Constants

Most common configurations, no default, must be set explicitly

Precision of computationNamed constantDFTI_PRECISION

Domain for the forwardtransform

Named constantDFTI_FORWARD_DOMAIN

Dimension of the transformInteger scalarDFTI_DIMENSION

Lengths of each dimensionIntegerscalar/array

DFTI_LENGTHS

Common configurations including multiple transform and data representation

For multiple number oftransforms

Integer scalarDFTI_NUMBER_OF_TRANSFORMS

Scale factor for forwardtransform

Floating-pointscalar

DFTI_FORWARD_SCALE

Scale factor for backwardtransform

Floating-pointscalar

DFTI_BACKWARD_SCALE

2471



Placement of the computationresult

Named constantDFTI_PLACEMENT

Storage method, complexdomain data

Named constantDFTI_COMPLEX_STORAGE

Storage method, real domaindata

Named constantDFTI_REAL_STORAGE

Storage method, conjugateeven domain data

Named constantDFTI_CONJUGATE_EVEN_STORAGE

No longer thanDFTI_MAX_NAME_LENGTH

Character stringDFTI_DESCRIPTOR_NAME

Packed format, real domaindata

Named constantDFTI_PACKED_FORMAT

Number of user threadsemploying the same descriptorfor DFT computation

Integer scalarDFTI_NUMBER_OF_USER_THREADS

Configurations regarding stride of data

Multiple transforms, distanceof first elements

Integer scalarDFTI_INPUT_DISTANCE

Multiple transforms, distanceof first elements

Integer scalarDFTI_OUTPUT_DISTANCE

Stride information of input dataInteger arrayDFTI_INPUT_STRIDES

Stride information of outputdata

Integer arrayDFTI_OUTPUT_STRIDES

Advanced configuration

Scrambling of data orderNamed constantDFTI_ORDERING

Scrambling of dimensionNamed constantDFTI_TRANSPOSE

2472


Table 11-4Read-Only Configuration Parameters


Whether descriptor has been committedName constantDFTI_COMMIT_STATUS

Intel MKL library version numberStringDFTI_VERSION

The configuration parameters are set by various values. Some of these values are specified bynative data types such as an integer value (for example, number of simultaneous transformsrequested), or a single-precision number (for example, the scale factor one would like to applyon a forward transform).

Other configuration values are discrete in nature (for example, the domain of the forwardtransform) and are thus provided in the DFTI module as named constants. In C, these namedconstants have the enumeration type DFTI_CONFIG_VALUE. The complete list of named constantsused for this kind of configuration values is given in Table 11-5.

Table 11-5 Named Constant Configuration Values

CommentsNamed Constant

Single precisionDFTI_SINGLE

Double precisionDFTI_DOUBLE

Complex domainDFTI_COMPLEX

Real domainDFTI_REAL

Output overwrites inputDFTI_INPLACE

Output does not overwrite inputDFTI_NOT_INPLACE

Storage method (see Storage schemes)DFTI_COMPLEX_COMPLEX

Storage method (see Storage schemes)DFTI_REAL_REAL

Storage method (see Storage schemes)DFTI_COMPLEX_REAL

Storage method (see Storage schemes)DFTI_REAL_COMPLEX

Committal status of a descriptorDFTI_COMMITTED

Committal status of a descriptorDFTI_UNCOMMITTED

2473


CommentsNamed Constant

Data ordered in both forward and backward domainsDFTI_ORDERED

Data scrambled in backward domain (by forward transform)DFTI_BACKWARD_SCRAMBLED

Used to specify no transpositionDFTI_NONE

Packed format, real data (see Packed formats)DFTI_CCS_FORMAT

Packed format, real data (see Packed formats)DFTI_PACK_FORMAT

Packed format, real data (see Packed formats)DFTI_PERM_FORMAT

Packed format, real data (see Packed formats)DFTI_CCE_RORMAT

Number of characters for library version lengthDFTI_VERSION_LENGTH

Maximum descriptor name lengthDFTI_MAX_NAME_LENGTH

Maximum status message lengthDFTI_MAX_MESSAGE_LENGTH

Table 11-6 lists the possible values for those configuration parameters that are discrete innature.

Table 11-6 Settings for Discrete Configuration Parameters

Possible ValuesNamed Constant

DFTI_SINGLE, orDFTI_PRECISION

DFTI_DOUBLE (no default)

DFTI_COMPLEX, orDFTI_FORWARD_DOMAIN

DFTI_REAL

DFTI_INPLACE (default), orDFTI_PLACEMENT

DFTI_NOT_INPLACE

DFTI_COMPLEX_COMPLEX (default), orDFTI_COMPLEX_STORAGE

DFTI_REAL_REAL (default), orDFTI_REAL_STORAGE

2474


Possible ValuesNamed Constant

DFTI_REAL_COMPLEX

DFTI_COMPLEX_COMPLEX, orDFTI_CONJUGATE_EVEN_STORAGE

DFTI_COMPLEX_REAL (default)

DFTI_CCS_FORMAT (default), orDFTI_PACKED_FORMAT

DFTI_PACK_FORMAT, or

DFTI_PERM_FORMAT, or

DFTI_CCE_FORMAT

Table 11-7 lists the default values of the settable configuration parameters.

Table 11-7 Default Configuration Values of Settable Parameters

Default ValueNamed Constants

1DFTI_NUMBER_OF_TRANSFORMS

1DFTI_NUMBER_OF_USER_THREADS

1.0DFTI_FORWARD_SCALE

1.0DFTI_BACKWARD_SCALE

DFTI_INPLACEDFTI_PLACEMENT

DFTI_COMPLEX_COMPLEXDFTI_COMPLEX_STORAGE

DFTI_REAL_REALDFTI_REAL_STORAGE

DFTI_COMPLEX_REALDFTI_CONJUGATE_EVEN_STORAGE

DFTI_CCS_FORMATDFTI_PACKED_FORMAT

no name, string of zero lengthDFTI_DESCRIPTOR_NAME

0DFTI_INPUT_DISTANCE

0DFTI_OUTPUT_DISTANCE

2475


Default ValueNamed Constants

Tightly packed according to dimension and FFT lengthsDFTI_INPUT_STRIDES

Same as above. see Strides for detailsDFTI_OUTPUT_STRIDES

DFTI_ORDEREDDFTI_ORDERING

DFTI_NONEDFTI_TRANSPOSE

Precision of transform

The configuration parameter DFTI_PRECISION denotes the floating-point precision in whichthe transform is to be carried out. A setting of DFTI_SINGLE stands for single precision, anda setting of DFTI_DOUBLE stands for double precision. The data is meant to be presented inthis precision; the computation will be carried out in this precision; and the result will bedelivered in this precision. This is one of the four settable configuration parameters that do nothave default values. The user must set them explicitly, most conveniently at the call to descriptorcreation function DftiCreateDescriptor.

Forward domain of transform

The general form of the discrete Fourier transform is

for k1 = 0,±1,±2, ... , where σ is an arbitrary real-valued scale factor and δ = ±1. The

forward transform is defined by σ = 1 and δ = -1. In most common situations, the domainof the forward transform, that is, the set where the input (periodic) sequence {wj1, j2, ...,

jd} belongs, can be either the set of complex-valued sequences, real-valued sequences, andcomplex-valued conjugate even sequences. The configuration parameter DFTI_FORWARD_DOMAINindicates the domain for the forward transform. Note that this implicitly specifies the domainfor the backward transform because of mathematical property of the DFT. See Table 11-8 fordetails.

2476


Table 11-8 Correspondence of Forward and Backward Domain

Implied Backward DomainForward Domain

Complex(DFTI_COMPLEX)Complex

Conjugate Even(DFTI_REAL)Real

On transforms in the real domain, some software packages only offer one "real-to-complex"transform. This in essence omits the conjugate even domain for the forward transform. Theforward domain configuration parameter DFTI_FORWARD_DOMAIN is the second of fourconfiguration parameters without default value.

Transform dimension and lengths

The dimension of the transform is a positive integer value represented in an integer scalar ofInteger data type in Fortran and long data type in C. For one-dimensional transform, thetransform length is specified by a positive integer value represented in an integer scalar ofInteger data type in Fortran and long data type in C. For multi-dimensional (≥ 2) transform,the lengths of each of the dimension is supplied in an integer array (Integer data type inFortran and long data type in C). DFTI_DIMENSION and DFTI_LENGTHS are the remaining twoof four configuration parameters without default.

As mentioned, these four configuration parameters do not have default value. They are mostconveniently set at the descriptor creation function. They can only be set in the descriptorcreation function, and not by the function DftiSetValue.

Number of transforms

In some situations, the user may need to perform a number of DFT transforms of the samedimension and lengths. The most common situation would be to transform a number ofone-dimensional data of the same length. This parameter has the default value of 1, and canbe set to positive integer value by an Integer data type in Fortran and long data type in C.Data sets have no common elements. The distance parameter is obligatory if multiple numberis more than one.

Scale

The forward transform and backward transform are each associated with a scale factor σ of itsown with default value of 1. The user can set one or both of them via the two configurationparameters DFTI_FORWARD_SCALE and DFTI_BACKWARD_SCALE. For example, for a

2477


one-dimensional transform of length n, one can use the default scale of 1 for the forwardtransform while setting the scale factor for backward transform to be 1/n, making the backwardtransform the inverse of the forward transform.

The scale factor configuration parameter should be set by a real floating-point data type of thesame precision as the value for DFTI_PRECISION.

Placement of result

By default, the computational functions overwrite the input data with the output result. Thatis, the default setting of the configuration parameter DFTI_PLACEMENT is DFTI_INPLACE. Theuser can change that by setting it to DFTI_NOT_INPLACE. Data sets have no common elements.

Packed formats

The result of the forward transform (i.e. in the frequency-domain) of real data is representedin several possible packed formats: Pack, Perm, CCS, or CCE. The data can be packed due tothe symmetry property of the DFT transform of a real data.

The CCE format stores the values of the first half of the output complex conjugate-even signalresulted from the forward DFT. Note that the one-dimensional signal stored in CCE format isone complex element longer. For multi-dimensional real transform, n1 * n2 * n3 * ... *nk the size of complex matrix in CCE format is (n1/2+1)* n2 * n3 * ... * nk for Fortranand n1 * n2 * ... * (nk/2+1) for C.

The CCS format looks like the CCE format. It is the same format as CCE for one-dimensionaltransform. The CCS format is slightly different for multi-dimensional real transform. In CCSformat, the output samples of the DFT are arranged as shown in Table 11-9 for one-dimensionalDFT and in Table 11-10 for two-dimensional DFT.

The Pack format is a compact representation of a complex conjugate-symmetric sequence.The disadvantage of this format is that it is not the natural format used by the real DFTalgorithms ("natural" in the sense that array is natural for complex DFTs). In Pack format, theoutput samples of the DFT are arranged as shown in Table 11-9 for one-dimensional DFT andin Table 11-11 for two-dimensional DFT.

The Perm format is an arbitrary permutation of the Pack format for even lengths and one isthe same as the Pack format for odd lengths. In Perm format, the output samples of the DFTare arranged as shown in Table 11-9 for one-dimensional DFT and in Table 11-12 fortwo-dimensional DFT.

Table 11-9 Packed Format Output Samples

For n = s*2

n+1nn-1n-2...3210DFT Real

2478


For n = s*2

0Rn/2In/2-1Rn/2-1...I1R10R0CCS

Rn/2In/2-1...R2I1R1R0Pack

In/2-1Rn/2-1...I1R1Rn/2R0Perm

For n = s*2 + 1

n+1nn-1n-2n-3n-4...3210DFT Real

IsRsIs-1Rs-1Is-2...I1R10R0CCS

IsRsIs-1Rs-1...R2I1R1R0Pack

IsRsIs-1Rs-1...R2I1R1R0Perm

Note that Table 11-9 uses the following notation for complex data entries:

Rj = Re zj

Ij = Im zj

See also Table 11-13 and Table 11-14.

Table 11-10 CCS Format Output Samples (Two-Dimensional Matrix (m+2)-by-(n+2))

For m = s*2, n = k*2

0z(1,k+1)IMz(1,k)REz(1,k)...IMz(1,2)REz(1,2)0z(1,1)

0000...0000

n/un/u*REz(2,n)REz(2,n-1)...REz(2,4)REz(2,3)REz(2,2)REz(2,1)

n/un/uIMz(2,n)IMz(2,n-1)...IMz(2,4)IMz(2,3)IMz(2,2)IMz(2,1)

n/un/u.....................

n/un/uREz(m/2,n)REz(m/2,n-1)...REz(m/2,4)REz(m/2,3)REz(m/2,2)REz(m/2,1)

n/un/uIMz(m/2,n)IMz(m/2,n-1)...IMz(m/2,4)IMz(m/2,3)IMz(m/2,2)IMz(m/2,1)

2479


For m = s*2, n = k*2

0z(m/2+1,k+1)IMz(m/2+1,k)REz(m/2+1,k)...IMz(m/2+1,2)REz(m/2+1,2)0z(m/2+1,1)

n/un/u00...0000

For m = s*2+1, n = k*2

0z(1,k+1)IMz(1,k)REz(1,k)...IMz(1,2)REz(1,2)0z(1,1)

0000...0000

n/un/uREz(2,n)REz(2,n-1)...REz(2,4)REz(2,3)REz(2,2)REz(2,1)

n/un/uIMz(2,n)IMz(2,n-1)...IMz(2,4)IMz(2,3)IMz(2,2)IMz(2,1)

n/un/u.....................

n/un/uREz(s,n)REz(s,n-1)...REz(s,4)REz(s,3)REz(s,2)REz(s,1)

n/un/uIMz(s,n)IMz(s,n-1)...IMz(s,4)IMz(s,3)IMz(s,2)IMz(s,1)

For m = s*2, n = k*2+1

IM z(1,k)REz(1,k)IMz(1,k-1)...IMz(1,2)REz(1,2)0z(1,1)

000...0000

n/u*REz(2,n)REz(2,n-1)...REz(2,4)REz(2,3)REz(2,2)REz(2,1)

n/uIMz(2,n)IMz(2,n-1)...IMz(2,4)IMz(2,3)IMz(2,2)IMz(2,1)

n/u.....................

n/uREz(m/2,n)REz(m/2,n-1)...REz(m/2,4)REz(m/2,3)REz(m/2,2)REz(m/2,1)

n/uIMz(m/2,n)IMz(m/2,n-1)...IMz(m/2,4)IMz(m/2,3)IMz(m/2,2)IMz(m/2,1)

IMz(m/2+1,k)REz(m/2+1,k)IMz(m/2+1,k-1)...IMz(m/2+1,2)REz(m/2+1,2)0z(m/2+1,1)

n/u00...0000

2480


For m = s*2+1, n = k*2+1

IMz(1,k)REz(1,k)IMz(1,k-1)...IMz(1,2)REz(1,2)0z(1,1)

000...0000

n/uREz(2,n)REz(2,n-1)...REz(2,4)REz(2,3)REz(2,2)REz(2,1)

n/uIMz(2,n)IMz(2,n-1)...IMz(2,4)IMz(2,3)IMz(2,2)IMz(2,1)

n/u.....................

n/uREz(s,n)REz(s,n-1)...REz(s,4)REz(s,3)REz(s,2)REz(s,1)

n/uIMz(s,n)IMz(s,n-1)...IMz(s,4)IMz(s,3)IMz(s,2)IMz(s,1)

* n/u - not used.

Note that in the Table 11-10, (n+2) columns are used for even n = k*2, while n columns areused for odd n = k*2+1. In the latter case, the first row is

z(1,1) 0 REz(1,2) IMz(1,2) ... REz(1,k) IMz(1,k)

If m is even, the (m+1)-th row is

z(m/2+1,1) 0 REz(m/2+1,2) IMz(m/2+1,2) ... REz(m/2+1,k) IMz(m/2+1,k)

Table 11-11 Pack Format Output Samples (Two-Dimensional Matrix m-by-n)

For m = s*2

z(1,k+1)IMz(1,k)...REz(1,3)IMz(1,2)REz(1,2)z(1,1)

REz(2,n)REz(2,n-1)...REz(2,4)REz(2,3)REz(2,2)REz(2,1)

IMz(2,n)IMz(2,n-1)...IMz(2,4)IMz(2,3)IMz(2,2)IMz(2,1)

.....................

REz(m/2,n)REz(m/2,n-1)...REz(m/2,4)REz(m/2,3)REz(m/2,2)REz(m/2,1)

IMz(m/2,n)IMz(m/2,n-1)...IMz(m/2,4)IMz(m/2,3)IMz(m/2,2)IMz(m/2,1)

z(m/2+1,k+1)IMz(m/2+1,k)...REz(m/2+1,3)IMz(m/2+1,2)REz(m/2+1,2)z(m/2+1,1)

2481


For m = s*2+1

z(1,n/2+1)IMz(1,k)...REz(1,3)IMz(1,2)REz(1,2)z(1,1)



.....................

REz(s,n)REz(s,n-1)...REz(s,4)REz(s,3)REz(s,2)REz(s,1)

IMz(s,n)IMz(s,n-1)...IMz(s,4)IMz(s,3)IMz(s,2)IMz(s,1)

Table 11-12 Perm Format Output Samples (Two-Dimensional Matrix m-by-n)

For m = s*2

IMz(1,k)REz(1,k)...IMz(1,2)REz(1,2)z(1,k+1)z(1,1)

IMz(m/2+1,k)REz(m/2+1,k)...IMz(m/2+1,2)REz(m/2+1,2)z(m/2+1,k+1)z(m/2+1,1)



.....................

REz(m/2,n)REz(m/2,n-1)...REz(m/2,4)REz(m/2,3)REz(m/2,2)REz(m/2,1)

IMz(m/2,n)IMz(m/2,n-1)...IMz(m/2,4)IMz(m/2,3)IMz(m/2,2)IMz(m/2,1)

For m = s*2+1

IMz(1,k)REz(1,k)...IMz(1,2)REz(1,2)z(1,k+1)z(1,1)



.....................

2482


For m = s*2+1

REz(s,n)REz(s,n-1)...REz(s,4)REz(s,3)REz(s,2)REz(s,1)

IMz(s,n)IMz(s,n-1)...IMz(s,4)IMz(s,3)IMz(s,2)IMz(s,1)

Note that in the Table 11-11 and Table 11-12, for even number of columns n = k*2, while forodd number of columns n = k*2+1 and the first row is

z(1,1) REz(1,2) IMz(1,2) ... REz(1,k) IMz(1,k)

If m is even, the last row in Pack format and the second row in Perm format is

z(m/2+1,1) REz(m/2+1,2) IMz(m/2+1,2) ... REz(m/2+1,k) IMz(m/2+1,k)

The tables for two-dimensional DFT use Fortran-interface conventions. For C-interface specificsin storing packed data, see Storage schemes section below. See also Table 11-15 and Table11-16 for examples of Fortran-interface and C-interface formats.

Storage schemes

For each of the three domains DFTI_COMPLEX, DFTI_REAL, and DFTI_CONJUGATE_EVEN (forthe forward as well as the backward operator), a subset of the four storage schemesDFTI_COMPLEX_COMPLEX, DFTI_COMPLEX_REAL, DFTI_REAL_COMPLEX, and DFTI_REAL_REALis provided. Specific examples are presented here to illustrate the storage schemes. See thedocument [3] for the rationale behind this definition of the storage schemes.

NOTE. The data is stored in the Fortran style only, that is, the real and imaginary partsare stored side by side.

Storage scheme for complex domain. This setting is recorded in the configuration parameterDFTI_COMPLEX_STORAGE. The three values that can be set are DFTI_COMPLEX_COMPLEX,DFTI_COMPLEX_REAL, and DFTI_REAL_REAL. Consider a one-dimensional n-length transformof the form

2483


Assume the stride has default value (unit stride) and DFTI_PLACEMENT has the default in-placesetting.

DFTI_COMPLEX_COMPLEX storage scheme (by default). A typical usage will be as follows.

COMPLEX :: X(0:n-1)

...some other code...

Status = DftiComputeForward( Desc_Handle, X )

On input,

X(j) = wj, j = 0,1,...,n-1.

On output,

X(k) = zk , k = 0,1,...,n-1.

Storage scheme for the real and conjugate even domains. This setting for the storageschemes for these domains is recorded in the configuration parameters DFTI_REAL_STORAGEand DFTI_CONJUGATE_EVEN_STORAGE. Since a forward real domain corresponds to a conjugateeven backward domain, they are considered together. The example uses one-, two- andthree-dimensional real to conjugate even transforms. In-place computation is assumed wheneverpossible (that is, when the input data type matches the output data type).

One-Dimensional Transform

Consider a one-dimensional n-length transform of the form

There is a symmetry:

For even n: z(n/2+i) = conjg(z(n/2-i)), 1≤i≤n/2-1, and moreover z(0) and z(n/2)are real values.

For odd n: z(m+i) = conjg(z(m-i+1)), 1≤i≤m, and moreover z(0) is real value.

m = floor(n/2).

2484


Table 11-13 Comparison of the Storage Effects of Complex-to-Complex andReal-to-Complex DFTs for Forward Transform

N=8

Output VectorsInput Vectors

real DFTcomplex DFTRealDFT

Complex DFT

Real DataComplex DataRealData

Complex Data

PermPackCCSImaginaryRealImaginaryReal

z0z0z00.000000z0w00.000000w0

z4Re(z1)0.000000Im(z1)Re(z1)w10.000000w1

Re(z1)Im(z1)Re(z1)Im(z2)Re(z2)w20.000000w2

Im(z1)Re(z2)Im(z1)Im(z3)Re(z3)w30.000000w3

Re(z2)Im(z2)Re(z2)0.000000z4w40.000000w4

Im(z2)Re(z3)Im(z2)-Im(z3)Re(z3)w50.000000w5

Re(z3)Im(z3)Re(z3)-Im(z2)Re(z2)w60.000000w6

Im(z3)z4Im(z3)-Im(z1)Re(z1)w70.000000w7

z4

0.000000

N=7



Complex DFT

2485


N=7


Complex Data


z0z0z00.000000z0w00.000000w0

Re(z1)Re(z1)0.000000Im(z1)Re(z1)w10.000000w1

Im(z1)Im(z1)Re(z1)Im(z2)Re(z2)w20.000000w2

Re(z2)Re(z2)Im(z1)Im(z3)Re(z3)w30.000000w3

Im(z2)Im(z2)Re(z2)-Im(z3)Re(z3)w40.000000w4

Re(z3)Re(z3)Im(z2)-Im(z2)Re(z2)w50.000000w5


Im(z3)

Table 11-14 Comparison of the Storage Effects of Complex-to-Complex andComplex-to-Real DFTs for Backward Transform

N=8



Complex DFT


Complex Data


z0z0z00.000000z0w00.000000w0

2486


N=8

z4Re(z1)0.000000Im(z1)Re(z1)w10.000000w1

Re(z1)Im(z1)Re(z1)Im(z2)Re(z2)w20.000000w2

Im(z1)Re(z2)Im(z1)Im(z3)Re(z3)w30.000000w3

Re(z2)Im(z2)Re(z2)z4w40.000000w4

Im(z2)Re(z3)Im(z2)-Im(z3)Re(z3)w50.000000w5

Re(z3)Im(z3)Re(z3)-Im(z2)Re(z2)w60.000000w6

Im(z3)z4Im(z3)-Im(z1)Re(z1)w70.00000w7

z4

0.000000

N=7



Complex DFT


Complex Data


z0z0z00.000000z0w00.000000w0

Re(z1)Re(z1)0.000000Im(z1)Re(z1)w10.000000w1

Im(z1)Im(z1)Re(z1)Im(z2)Re(z2)w20.000000w2

Re(z2)Re(z2)Im(z1)Im(z3)Re(z3)w30.000000w3

2487


N=7


Re(z3)Re(z3)Im(z2)-Im(z2)Re(z2)w50.000000w5


Im(z3)

Assume that the stride has the default value (unit stride).

This complex conjugate-symmetric vector can be stored in the complex array of size m+1 orin the real array of size 2m+2 or 2m depending on packed format.

Two-Dimensional Transform

Each of the real-to-complex routines computes the forward DFT of a two-dimensional realmatrix according to the mathematical equation

tk,l = cmplx(rk,l,0), where rk,l is a real input matrix, 0≤k≤m-1, 0 ≤l≤n-1. The mathematical

result zi,j, 0≤i≤m-1, 0≤j≤n-1, is the complex matrix of size (m,n). Each column is thecomplex conjugate-symmetric vector as follows:

For even m:

for 0 ≤j≤n-1,

z(m/2+i,j) = conjg(z(m/2-i,j)),1≤i≤m/2-1.

Moreover, z(0,j) and z(m/2,j) are real values for j=0 and j=n/2.

For odd m:

for 0≤j≤n-1,

z(s+i,j) = conjg(z(s-i,j)), 1≤i≤s-1,

2488


where s = floor(m/2).

Moreover, z(0,j) are real values for j=0 and j=n/2.

This mathematical result can be stored in the real two-dimensional array of size:

(m+2,n+2) (CCS format), or

(m,n) (Pack or Perm formats), or

(2*(m/1+1), n) (CCE format, Fortran-interface),

((m, 2*(n/2+1)) (CCE format, C-interface)

or in the complex two-dimensional array of size:

(m/2+1, n) (CCE format, Fortran-interface),

(m, n/2+1) (CCE format, C-interface)

Since the multidimensional array data are arranged differently in Fortran and C (see Strides),the output array that holds the computational result contains complex conjugate-symmetriccolumns (for Fortran) or complex conjugate-symmetric rows (for C).

The following tables give examples of output data layout in Pack format for a forwardtwo-dimensional real-to-complex DFT of a 6-by-4 real matrix. Note that the same layout isused for the input data of the corresponding backward complex-to-real DFT.

Table 11-15 Fortran-interface Data Layout for a 6-by-4 Matrix

z(1,3)Im z(1,2)Re z(1,2)z(1,1)

Re z(2,4)Re z(2,3)Re z(2,2)Re z(2,1)

Im z(2,4)Im z(2,3)Im z(2,2)Im z(2,1)

Re z(3,4)Re z(3,3)Re z(3,2)Re z(3,1)

Im z(3,4)Im z(3,3)Im z(3,2)Im z(3,1)

z(4,3)Im z(4,2)Re z(4,2)z(4,1)

For the above example, the stride array is taken to be (0, 1, 6).

Table 11-16 C-interface Data Layout for a 6-by-4 Matrix

z(1,3)Im z(1,2)Re z(1,2)z(1,1)

2489


Re z(2,3)Im z(2,2)Re z(2,2)Re z(2,1)

Im z(2,3)Im z(3,2)Re z(3,2)Im z(2,1)

Re z(3,3)Im z(4,2)Re z(4,2)Re z(3,1)

Im z(3,3)Im z(5,2)Re z(5,2)Im z(3,1)

z(4,3)Im z(6,2)Re z(6,2)z(4,1)

For the second example, the stride array is taken to be (0, 4, 1).

See also Packed formats.

Three-Dimensional Transform

Each of the real-to-complex routines computes the forward DFT of a three-dimensional realmatrix according to the mathematical equation

tp,l,s = cmplx(rp,l,s,0), where rp,l,s is a real input matrix, 0 ≤ k ≤ m-1, 0 ≤ l ≤ n-1,

0 ≤ s ≤ k-1. The mathematical result zi,j,q, 0 ≤ i ≤ m-1, 0 ≤ j ≤ n-1, 0 ≤ q ≤ k-1is the complex matrix of size (m,n,k), which is a complex conjugate-symmetric, or conjugateeven, matrix as follows:

zm1,n1,k1 = conjg(zm-m1,n-n1,k-k1), where each dimension is periodic.

This mathematical result can be stored in the real three-dimensional array of size:

(m/2+1,n,k) (CCE format, Fortran-interface),

(m,n,k/2+1) (CCE format, C-interface).

Since the multidimensional array data are arranged differently in Fortran and C (see Strides),the output array that holds the computational result contains complex conjugate-symmetriccolumns (for Fortran) or complex conjugate-symmetric rows (for C).

2490


NOTE. There is one packed format for 3D REAL DFT - CCE format. In both in-place andout-of-place REAL DFT, for real data, the stride and distance parameters are in REALunits and for complex data, they are in COMPLEX units. So, elements of input and outputdata can be placed in different elements of input-output array of the in-place FFT.

1. DFTI_REAL_REAL for real domain, DFTI_COMPLEX_REAL for conjugate even domain (bydefault). It is used for 1D and 2D REAL DFT.

• A typical usage of in-place transform is as follows:

// m = floor( n/2 )

REAL :: X(0:2*m+1)


...assuming inplace transform...


On input,

X(j) = wj, j = 0,1,...,n-1.

On output,

Output data stored in one of formats: Pack, Perm or CCS (see Packed formats).

CCS format: X(2*k) = Re(zk) , X(2*k+1) = Im(zk) , k = 0,1,...,m.

Pack format:

even n: X(0) = Re(z0), X(2*k-1) = Re(zk), X(2*k) = Im(zk), k = 1,...,m-1,and X(n-1) = Re(zm)

odd n: X(0) = Re(z0), X(2*k-1) = Re(zk), X(2*k) = Im(zk), k = 1,...,m

Perm format:

even n: X(0) = Re(z0), X(1) = Re(zm), X(2*k) = Re(zk) , X(2*k+1) = Im(zk), k = 1,...,m-1,

odd n: X(0) = Re(z0), X(2*k-1) = Re(zk), X(2*k) = Im(zk), k = 1,...,m.

See Example C-16, Example C-17, Example C-18, and Example C-19.

Input and output data exchange the roles in the backward transform.

2491


• A typical usage of out-of-place transform is as follows:

// m = floor( n/2 )

REAL :: X(0:n-1)

REAL :: Y(0:2*m+1)


...assuming out-of-place transform...

Status = DftiComputeForward( Desc_Handle, X, Y )

On input, X(j) = wj, j = 0,1,...,n-1.

On output,

Output data stored in one of formats: Pack, Perm or CCS (see Packed formats).

CCS format: Y(2*k) = Re(zk) , Y(2*k+1) = Im(zk) , k = 0,1,...,m.

Pack format:

even n: Y(0) = Re(z0), Y(2*k-1) = Re(zk), Y(2*k) = Im(zk), k = 1,...,m-1,and Y(n-1) = Re(zm)

odd n: Y(0) = Re(z0), Y(2*k-1) = Re(zk), Y(2*k) = Im(zk), k = 1,...,m

Perm format:

even n: Y(0) = Re(z0), Y(1) = Re(zm), Y(2*k) = Re(zk) , Y(2*k+1) = Im(zk), k = 1,...,m-1,

odd n: Y(0) = Re(z0), Y(2*k-1) = Re(zk), Y(2*k) = Im(zk), k = 1,...,m.

Notice that if the stride of the output array is not set to the default value unit stride, thereal and imaginary parts of one complex element will be placed with this stride.

For example:

CCS format: Y(2*k*s) = Re(zk) , Y((2*k+1)*s) = Im(zk) , k = 0,1, ..., m,s - stride.

See Example C-16a and Example C-17a.


2. DFTI_REAL_REAL for real domain, DFTI_COMPLEX_COMPLEX for conjugate even domain. Itis used for 1D, 2D and 3D REAL DFT. The CCE format is set by default. You must explicitlyset the storage scheme in this case, because its value is not the default one.

2492


• A typical usage of in-place transform is as follows:

// m = floor( n/2 )

REAL :: X(0:m*2)


...assuming in-place transform...

Status = DftiSetValue( Desc_Handle, DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_COMPLEX)

...

Status = DftiComputeForward( Desc_Handle, X)

On input,

X(j) = wj, j = 0,1,...,n-1.

On output,

X(2*k) = Re(zk), X(2*k+1) = Im(zk), k = 0,1,...,m.

See Example C-24.


• A typical usage of out-of-place transform is as follows:

// m = floor( n/2 )

REAL :: X(0:n-1)

COMPLEX :: Y(0:m)


...assuming out-of-place transform...

Status = DftiSetValue( Desc_Handle, DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_COMPLEX)

...

Status = DftiComputeForward( Desc_Handle, X, Y )

On input,

X(j) = wj, j = 0,1,...,n-1.

On output,

Y(k) = zk , k = 0,1,...,m.

2493


See Example C-24a and Example C-25


3. DFTI_REAL_COMPLEX for real domain, DFTI_COMPLEX_COMPLEX for conjugate even domain.It is not used in the current version. See Note in the “DFT Interface” section for details. Atypical usage is as follows:

// m = floor( n/2 )

COMPLEX :: X(0:m)


...inplace transform...


On input,

X(j) = wj, j = 0,1,...,n-1.

That is, the imaginary parts of X(j) are zero.

On output,

Y(k) = zk, k = 0,1,...,m,

where m is floor( n/2 ).

Number of user threads

Customer application can be parallelized by using the following techniques:

1. You do not create threads in your application but specify the parallel mode within the DFTmodule of Intel MKL. See Intel MKL User's Guide document for more informationon how to do this.

2. You create threads in application yourself and have each thread perform all stages of DFTimplementation including descriptor initialization, DFT computation, and descriptordeallocation. In this case each descriptor is used only within its corresponding thread.

3. You create threads after initializing the DFT descriptor. This implies that threading is employedfor parallel DFT computation only, and the descriptor is freed after return from the parallelregion. In this case each thread uses the same descriptor.

For the first and second cases listed above, set the parameter DFTI_NUMBER_OF_USER_THREADSto 1 (its default value), since each particular descriptor instance is used only in a single thread.

2494


In case 3, you must use the DftiSetValue() function to set theDFTI_NUMBER_OF_USER_THREADS to the actual number of DFT computation threads, becausemultiple threads will be using the same descriptor. If this setting is not done, your programwill work incorrectly or fail, since the descriptor contains individual data for each thread.

1. It is not recommended to simultaneously parallelize your program and employ theIntel MKL internal threading because this will slow down performance. Note that incase 3 above, DFT computation is automatically initiated in a single threading mode.

2. You must not change the number of threads after the DftiCommitDescriptor()function completed DFT initialization.

See Example C-26, Example C-27, and Example C-28 in Appendix C.

Input and output distances

DFT interface in Intel MKL allows the computation of multiple number of transforms.Consequently, the user needs to be able to specify the data distribution of these multiple setsof data. This is accomplished by the distance between the first data element of the consecutivedata sets. This parameter is obligatory if multiple number is more than one. The parameter isa value of Integer data type in Fortran and long data type in C. Data sets don't have anycommon elements. The following example illustrates the specification. Consider computing theforward DFT on three 32-length complex sequences stored in X(0:31, 1), X(0:31, 2), andX(0:31, 3). Suppose the results are to be stored in the locations Y(0:31, k), k = 1, 2, 3,of the array Y(0:63, 3). Thus the input distance is 32, while the output distance is 64. Noticethat the data and result parameters in computation functions are all declared as assumed-sizerank-1 array DIMENSION(0:*). Therefore two-dimensional array must be transformed toone-dimensional array by EQUIVALENCE statement or other facilities of Fortran. Here is thecode fragment:Complex :: X_2D(0:31,3), Y_2D(0:63, 3)

Complex :: X(96), Y(192)

Equivalence (X_2D, X)

Equivalence (Y_2D, Y)

...................

Status = DftiCreateDescriptor(Desc_Handle, DFTI_SINGLE,

2495


DFTI_COMPLEX, 1, 32)

Status = DftiSetValue(Desc_Handle, DFTI_NUMBER_OF_TRANSFORMS, 3)

Status = DftiSetValue(Desc_Handle, DFTI_INPUT_DISTANCE, 32)

Status = DftiSetValue(Desc_Handle, DFTI_OUTPUT_DISTANCE, 64)

Status = DftiSetValue(Desc_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiCommitDescriptor(Desc_Handle)

Status = DftiComputeForward(Desc_Handle, X, Y)

Status = DftiFreeDescriptor(Desc_Handle)

Strides

In addition to supporting transforms of multiple number of datasets, DFT interface supportsnon-unit stride distribution of data within each data set. The parameter is an array of valuesof Integer data type in Fortran and long data type in C. Consider the following situation where

a 32-length DFT is to be computed on the sequence xj, 0≤j<32. The actual location of thesevalues are in X(5), X(7), ..., X(67) of an array X(1:68). The stride accommodated byDFT interface consists of a displacement from the first element of the data array L0, (4 in thiscase), and a constant distance of consecutive elements L1 (2 in this case). Thus for the Fortranarray X

xj = X(1 + L0 + L1 * j) = X(5 + L1 * j).

2496

This stride vector (4,2) is provided by a length-2 rank-1 integer array:COMPLEX :: X(68)

INTEGER :: Stride(2)

...................

Status = DftiCreateDescriptor(Desc_Handle, DFTI_SINGLE,

DFTI_COMPLEX, 1, 32)

Stride = (/ 4, 2 /)

Status = DftiSetValue(Desc_Handle, DFTI_INPUT_STRIDES, Stride)

Status = DftiSetValue(Desc_Handle, DFTI_OUTPUT_STRIDES, Stride)

Status = DftiCommitDescriptor(Desc_Handle)

Status = DftiComputeForward(Desc_Handle, X)

Status = DftiFreeDescriptor(Desc_Handle)

In general, for a d-dimensional transform, the stride is provided by a d+1-length integer vector(L0, L1, L2, ..., Ld) with the meaning:

L0 = displacement from the first array element

L1 = distance between consecutive data elements in the first dimension

L2 = distance between consecutive data elements in the second dimension

... = ...

Ld = distance between consecutive data elements in the d-th dimension.

A d-dimensional data sequence

Xj1 , j2 , ... , jd, 0 ≤ ji ≤ J i , 1 ≤ i ≤ d

will be stored in the rank-1 array X by the mapping

Xj1 , j2 , ... , jd= X (first index + L0 + j1 L1 + j2 L2 + ... + jd Ld) .

For multiple transforms, the value L0 applies to the first data sequence, and Lj, j = 1, 2,...,d apply to all the data sequences.

In the case of a single one-dimensional sequence, L1 is simply the usual stride. The defaultsetting of strides in the general multi-dimensional situation corresponds to the case where thesequences are distributed tightly into the array:

2497


Both the input data and output data have a stride associated with it. The default is set inaccordance with the data to be stored contiguously in memory in a way that is natural to thelanguage.

Note that in case of a real FFT, where different formats are available, the default value is notthe one that seems most natural for certain formats. For example, with the CCE format, stridesare set by default to L1 = 1, L2 = J1 for a real transform regardless. However, for a complexmatrix, slightly over half of the matrix is actually stored, and you should set strides to L1 = 1,L2 = J1/2+1. In case of an in-place transform with the CCE data format, even for a real array,you should set strides explicitly: L1 = 1, L2 = (J1/2+1)*2.

See Example C-23 as an illustration of how to use the configuration parameters discussedabove. See Storage schemes and Packed formats on how to define arrays for different formats.

Ordering

It is well known that a number of FFT algorithms apply an explicit permutation stage that istime consuming [4]. The exclusion of this step is similar to applying DFT to input data whoseorder is scrambled, or allowing a scrambled order of the DFT results. In applications such asconvolution and power spectrum calculation, the order of result or data is unimportant andthus permission of scrambled order is attractive if it leads to higher performance. The followingoptions are available in Intel MKL:

• DFTI_ORDERED: Forward transform data ordered, backward transform data ordered (defaultoption).

• DFTI_BACKWARD_SCRAMBLED: Forward transform data ordered, backward transform datascrambled.

Table 11-17 tabulates the effect on this configuration setting.

Table 11-17 Scrambled Order Transform

DftiComputeBackwardDftiComputeForward

Input → OutputInput → OutputDFTI_ORDERING

ordered → orderedordered → orderedDFTI_ORDERED

scrambled → orderedordered → scrambledDFTI_BACKWARD_SCRAMBLED

2498


Note that meaning of the latter two options are "allow scrambled order if practical." There aresituations where in fact allowing out of order data gives no performance advantage, and thusan implementation may choose to ignore the suggestion. Strictly speaking, the normal orderis also a scrambled order, the trivial one.

Transposition

This is an option that allows for the result of a high-dimensional transform to be presented ina transposed manner. The default setting is DFTI_NONE and can be set to DFTI_ALLOW. Similarto that of scrambled order, sometimes in higher dimension transform, performance can begained if the result is delivered in a transposed manner. DFT interface offers an option for theoutput be returned in a transposed form if performance gain is possible. Since the generic stridespecification is naturally suited for representation of transposition, this option allows the stridesfor the output to be possibly different from those originally specified by the user. Consider anexample where a two-dimensional result

Yj1,j2, 0 ≤ ji < ni,

is expected. Originally the user specified that the result be distributed in the (flat) array Y inwith generic strides L1 = 1 and L2 = n1. With the transposition option, the computation mayactually return the result into Y with stride L1 = n2 and L2 = 1. These strides can be obtainedfrom an appropriate inquiry function. Note also that in dimension 3 and above, transpositionmeans an arbitrary permutation of the dimension.

Cluster DFT FunctionsThis section describes the cluster Discrete Fourier Transform (DFT) functions implemented inIntel MKL (available with Intel® MKL Cluster Edition for Linux* and Windows*).

Starting from version 9.0, Intel MKL provides two versions of cluster DFT interface:

• The new version, which is incompatible with the interfaces implemented in Intel MKL versionslower than 9.0.

• The interface introduced in Intel MKL 8.1, which is supported for backward compatibility.

This section describes the new version of the cluster DFT interface. Documentation on the elderversion is available on the Web athttp://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/219843.htm.

The cluster DFT function library was designed to perform Discrete Fourier Transform on acluster, that is, a group of computers interconnected via a network. Each computer (node) inthe cluster has its own memory and processor(s). Data interchanges between the nodes areprovided by the network.

2499

One or more processes may be running in parallel on each cluster node. To organizecommunication between different processes, the cluster DFT function library uses MessagePassing Interface (MPI). Given the number of available MPI implementations (for example,MPICH, Intel® MPI and others), Cluster DFT works with MPI via a message-passing library forlinear algebra, called BLACS, to avoid dependence on a specific MPI implementation.

The cluster Discrete Fourier Transform function library of Intel MKL provides one-dimensional,two-dimensional, and multi-dimensional (up to the order of 7) routines and both Fortran andC interfaces for all transform functions.

To develop applications using cluster DFT, you should have basic knowledge of and skills inMPI programming.

The interfaces for Intel Cluster MKL DFT functions are very similar to the corresponding interfacesfor conventional MKL DFT Functions described earlier in this chapter. You can refer there fordetails not explained in this section.

The full list of cluster DFT functions implemented in Intel MKL is given in Table 11-18:

Table 11-18 Cluster DFT Functions in Intel MKL



Allocates memory for the descriptor data structure andinstantiates it with default configuration settings.

DftiCreateDescriptorDM

Performs all initialization that facilitates the actual DFTcomputation.

DftiCommitDescriptorDM

Frees memory allocated for a descriptor.DftiFreeDescriptorDM


Computes the forward DFT.DftiComputeForwardDM

Computes the backward DFT.DftiComputeBackwardDM


Sets one particular configuration parameter with thespecified configuration value.

DftiSetValueDM

Gets the configuration value of one particularconfiguration parameter.

DftiGetValueDM

2500


Computing Cluster DFT

The cluster DFT functions described later in this section are implemented in Fortran and Cinterface. Fortran stands for Fortran 95.

Cluster DFT computation is performed by DftiComputeForwardDM andDftiComputeBackwardDM functions, called in a program using MPI, which will be referred toas MPI program. After an MPI program starts, a number of processes is created. MPI identifieseach process by its rank. The processes are independent of one another and communicate viaMPI. A function called in an MPI program is invoked in all the processes. Each process figuresout what to do using its rank. Input or output data for a cluster DFT transform is a sequenceof complex values. A cluster DFT computation function operates local part of the input data,i.e. some part of the data to be operated in a particular process, as well as generates local partof the output data. Each process performs its part of computations. Running in parallel andcommunicating through MPI, the processes perform the entire DFT computation. DFTcomputations using Intel MKL cluster DFT functions, should typically be effected by a numberof steps listed below:

1. Initiate MPI by calling MPI_Init in C/C++ or MPI_INIT in Fortran (the function must becalled prior to calling any DFT function and any MPI function).

2. Allocate memory for the descriptor by calling DftiCreateDescriptorDM.

3. Specify a value(s) of configuration parameters by a call(s) to DftiSetValueDM.

4. Obtain values of configuration parameters needed to create local data arrays; the valuesare retrieved by calls to DftiGetValueDM.

5. Perform initialization that facilitates DFT computation by a call to DftiCommitDescriptorDM.

6. Create arrays for local parts of input and output data and fill the local part of input data withvalues (For more information, see Distributing Data among Processes.)

7. Compute the transform by calling DftiComputeForwardDM or DftiComputeBackwardDM.

8. Gather local output data into the global array using MPI functions or employ the dataotherwise.

9. Release memory allocated for a descriptor by a call to DftiFreeDescriptorDM.

10. Finalize communication through MPI by calling MPI_Finalize in C/C++ or MPI_FINALIZEin Fortran (the function must be called after the last call to a cluster DFT function and thelast call to an MPI function).

Several code examples of using the cluster DFT interface functions in the “Examples for ClusterDFT Functions” section in Appendix C illustrate cluster DFT computations.

2501


Distributing Data among Processes

Intel MKL cluster DFT stores all input and output multi-dimensional arrays (matrices) inone-dimensional arrays (vectors). The arrays are stored in the row-major order in C/C++ andin the column-major order in Fortran. For example, a two-dimensional matrix A of size (m,n)will be stored in a vector B of size m*n so that

• B[i*n+j]=A[i][j] in C/C++ (i=0, ... , m-1, j=0, ... , n-1)

• B(j*m+i)=A(i,j) in Fortran (i=1, ... , m, j=1, ... , n).

NOTE. Order of FFT dimensions is the same as the order of array dimensions in theprogramming language. For example, a 3-dimensional FFT with Lengths=(m,n,l) canbe computed over an array Ar[m][n][l] in C/C++ or AR(m,n,l) in Fortran.

All MPI processes involved in cluster DFT computation operate their own portions of data. Theselocal arrays make up the virtual global array that the fast Fourier transform is applied to. It isyour responsibility to properly allocate local arrays (if needed), fill them with initial data andgather resulting data into an actual global array or process the resulting data otherwise. To beable do this, you should grasp how the virtual global array is composed of the local ones.

Multi-dimensional transforms

If the dimension of transform is greater than one, cluster DFT splits data in the dimensionwhose index changes most slowly, so that the parts contain all elements with several consecutivevalues of this index. It is the first dimension in C and the last one in Fortran. If the global arrayis two-dimensional, in C, it gives each process several consecutive rows. The term “rows” willbe used regardless of the array dimension and programming language. Local arrays are placedin memory allocated for the virtual global array consecutively, in the order determined byprocess ranks. For example, in case of two processes, during the computation of athree-dimensional transform whose matrix has size (11,15,12), the processes may store localarrays of sizes (6,15,12) and (5,15,12), respectively.

Let p be the number of MPI processes and the matrix of a transform to be computed have size(m,n,l). Then, in C, each MPI process will work with local data array of size (mq , n, l), where

Σmq=m, q=0, ... , p-1. Local input arrays should contain appropriate parts of the actual globalinput array. Then local output arrays will contain appropriate parts of the actual global outputarray. You can figure out which particular rows of the global array the local array should contain

2502


from the following configuration parameters of the cluster DFT interface: CDFT_LOCAL_NX,CDFT_LOCAL_START_X, and CDFT_LOCAL_SIZE. To retrieve values of the parameters, youshould use the DftiGetValueDM function:

• CDFT_LOCAL_NX specifies how many rows of the global array the current process receives.

• CDFT_LOCAL_START_X specifies which row of the global input or output array correspondsto the first row of the local input or output array. If A is a global array and L is the appropriatelocal array, then

– L[i][j][k]=A[i+cdft_local_start_x][j][k], where i=0, ... ,mq-1, j=0, ...

, n-1, k=0, ... ,l-1 for C/C++

– L(i,j,k)=A(i,j,k+cdft_local_start_x-1), where i=1, ... ,mq , j=1, ... ,

n, k=1, ... ,l for Fortran.

Example C-29 in Appendix C shows how cluster DFT distributes data among processes for atwo-dimensional FFT computation.

One-dimensional transforms

In this case, initial and resulting data are distributed among processes differently and even thenumber of elements stored in a particular process after the transform may vary compared withthe one before the transform. Each local array stores a segment of consecutive elements ofthe appropriate global array. Such segment may be determined by the number of elementsand a shift with respect to the first array element. So, to specify segments of the global inputand output arrays that a particular process receives, four configuration parameters are needed.In addition to CDFT_LOCAL_NX, and CDFT_LOCAL_START_X, the parameters CDFT_LOCAL_OUT_NXand CDFT_LOCAL_OUT_START_X have meaningful values, which can be also retrieved usingDftiGetValueDM. The meaning of the four configuration parameters depends upon the typeof the transform, as shown in Table 11-19:

Table 11-19 Data Distribution Configuration Parameters for 1D Transforms

Backward TransformForward TransformMeaning of the Parameter

CDFT_LOCAL_OUT_NXCDFT_LOCAL_NXNumber of elements in inputarray

CDFT_LOCAL_OUT_START_XCDFT_LOCAL_START_XElements shift in input array

CDFT_LOCAL_NXCDFT_LOCAL_OUT_NXNumber of elements in outputarray

2503


Backward TransformForward TransformMeaning of the Parameter

CDFT_LOCAL_START_XCDFT_LOCAL_OUT_START_XElements shift in output array

Memory size for local data

The memory size needed for local arrays cannot be just calculated from CDFT_LOCAL_NX(CDFT_LOCAL_OUT_NX), as cluster DFT sometimes requires allocating a little bit more memoryfor local data than just the size of the appropriate sub-array. The configuration parameterCDFT_LOCAL_SIZE specifies the size of the local input and output array in data elements. Inthe current implementation of cluster DFT interface, data elements are complex values, consistingof a real and imaginary parts. Each of the local input and output arrays must have size not lessthan CDFT_LOCAL_SIZE*size_of_element. If you employ a user-defined workspace for in-placetransforms (for more information, refer to Table 11-20), it must have the same size. ExampleC-30 in Appendix C illustrates how cluster DFT distributes data among processes in case of aone-dimensional FFT computation effected with a user-defined workspace.

Available Auxiliary Functions

If a global input array is located on one MPI process or you want to gather the global outputarray on one MPI process, you can use functions MKL_CDFT_ScatterData andMKL_CDFT_GatherData to distribute or gather data among processes, respectively. Thesefunctions are defined in a file that is delivered with the Intel MKL Cluster Edition product andlocated in the following subdirectory of the Intel MKL installation directory:examples/cdftc/examples_support.c for C/C++ and examples/cdftf/examples_support.f90 forFortran.

Restriction on Lengths of Transforms

The algorithm that cluster DFT uses to distribute data among processes imposes a restrictionon lengths of transforms with respect to the number of MPI processes used for the DFTcomputation:

• For a multi-dimensional transform, lengths of the first two dimensions in C/C++ or of thelast two dimensions in Fortran must be not less than the number of MPI processes.

• Length of a one-dimensional transform must be the product of two integers each of whichis not less than the number of MPI processes.

2504


Non-compliance with the restriction causes an error CDFT_SPREAD_ERROR (refer to Error Codesfor details). To achieve the compliance, you can change the transform lengths and/or thenumber of MPI processes, which is specified at start of an MPI program. MPI-2 enables changingthe number of processes during execution of an MPI program.

NOTE. The best performance of a one-dimensional FFT is reached when the transformlength L is a square of an integer number: L=N2.

Cluster DFT Interface

NOTE. Cluster DFT interface implemented in Intel MKL 9.0 is incompatible with previousversions. For details, refer to Intel MKL Release Notes.

To use the cluster DFT functions, you need to access the module MKL_CDFT through the "use"statement in Fortran; or access the header file mkl_cdfti.h through "include" in C/C++.

The Fortran interface provides a derived type DFTI_DESCRIPTOR_DM; a number of namedconstants representing various names of configuration parameters and their possible values;and a number of overloaded functions through the generic functionality of Fortran 95.

The C interface provides a structure type DFTI_DESCRIPTOR_DM_HANDLE and a number offunctions, some of which accept a different number of input arguments.

To provide communication between parallel processes through MPI, the following includestatement must be also present in your code:

• Fortran:

INCLUDE "mpif.h"

(for some MPI versions, "mpif90.h" header may be used instead).

• C/C++:

#include "mpi.h"

There are three main categories of the cluster DFT functions in Intel MKL:

2505


1. Descriptor Manipulation. There are three functions in this category. The first one,DftiCreateDescriptorDM, creates a DFT descriptor whose storage is allocated dynamicallyby the routine. The second, DftiCommitDescriptorDM, "commits" the descriptor to all itssettings. The third function, DftiFreeDescriptorDM, frees up all the memory allocatedfor the descriptor information.

2. DFT Computation. There are two functions in this category. The first one,DftiComputeForwardDM, effects a forward DFT computation, and the second function,DftiComputeBackwardDM, performs a backward DFT computation.

3. Descriptor Configuration. There are two functions in this category. One function,DftiSetValueDM, sets one specific configuration value to one of the many configurationparameters. The other, DftiGetValueDM, gets the current value of any of these configurationparameters, all of which are readable. These parameters, though many, are handled oneat a time.


There are three functions in this category: create a descriptor, commit a descriptor, and freea descriptor.

CreateDescriptorDMAllocates memory for the descriptor data structureand instantiates it with default configurationsettings.

Syntax

Fortran:

Status = DftiCreateDescriptorDM(comm, handle, v1, v2, dim, size)

Status = DftiCreateDescriptorDM(comm, handle, v1, v2, dim, sizes)

C/C++:

status = DftiCreateDescriptorDM(comm, &handle, v1, v2, dim, size );

status = DftiCreateDescriptorDM(comm, &handle, v1, v2, dim, sizes );

Input Parameters

MPI communicator, e.g. MPI_COMM_WORLD.comm

2506


Precision.v1

Type of forward domain. Must be DFTI_COMPLEX forthe current version.

v2

Dimension of transform.dim

Length of transform in a one-dimensional case.size

Lengths of transform in a multi-dimensional case.sizes

Output Parameters

Pointer to the handle of transform. If the functioncompletes successfully, the pointer to the createdhandle is stored in the variable.

handle

Description

This function allocates memory in a particular MPI process for the descriptor data structure andinstantiates it with default configuration settings with respect to the precision, domain,dimension, and length of the desired transform. The domain is understood to be the domainof the forward transform. The result is a pointer to the created descriptor. This function isslightly different from the "initialization" routine DftiCommitDescriptorDM in more traditionalsoftware packages or libraries used for computing DFT. In all likelihood, this function will notperform any significant computation work such as twiddle factors computation, as the defaultconfiguration settings can still be changed using the function DftiSetValueDM.

The precision is specified through named constants provided in the interface for the configurationvalues. The choices for precision are DFTI_SINGLE and DFTI_DOUBLE. It corresponds to precisionof input data, output data, and computation. A setting of DFTI_SINGLE indicates single-precisionfloating-point data type and a setting of DFTI_DOUBLE indicates double-precision floating-pointdata type.

Dimension is a simple positive integer indicating the dimension of the transform. In C/C++context, length is a single integer value of type long for one-dimensional transforms or anarray of integers of type long for multi-dimensional transforms. In Fortran context, length isan integer or an array of integers.

Return Values

The function returns DFTI_NO_ERROR when completes successfully. In this case, the pointer tothe created handle is stored in handle. If the function fails, it returns a value of another errorclass constant (for the list of constants, refer to the Error Codes section).

2507


Interface and Prototype! Fortran Interface

INTERFACE DftiCreateDescriptorDM

INTEGER(4) FUNCTION DftiCreateDescriptorDMn(C,H,P1,P2,D,L)

TYPE(DFTI_DESCRIPTOR_DM), POINTER :: H

INTEGER(4) C,P1,P2,D,L(*)

END FUNCTION

INTEGER(4) FUNCTION DftiCreateDescriptorDM1(C,H,P1,P2,D,L)

TYPE(DFTI_DESCRIPTOR_DM), POINTER :: H

INTEGER(4) C,P1,P2,D,L

END FUNCTION

END INTERFACE

/* C/C++ prototype */

long DftiCreateDescriptorDM(MPI_Comm,DFTI_DESCRIPTOR_DM_HANDLE*,

enum DFTI_CONFIG_VALUE,enum DFTI_CONFIG_VALUE,long,...);

CommitDescriptorDMPerforms all initialization that facilitates the actualDFT computation.

Syntax

Fortran:

Status = DftiCommitDescriptorDM(handle)

C/C++:

status = DftiCommitDescriptorDM(handle);

2508


Input Parameters

Valid handle obtained from DftiCreateDescriptorDM.handle

Description

The cluster DFT interface requires a function that commits a previously created descriptor beinvoked before the descriptor can be used for DFT computations in a particular MPI process.The DftiCommitDescriptorDM function performs all initialization that facilitates the actualDFT computation. For a modern implementation, it may involve exploring many differentfactorizations of the input length to search for highly efficient computation method.

Any changes of configuration parameters of a committed descriptor via the set value function(see Descriptor Configuration) requires a re-committal of the descriptor before a computationfunction can be invoked. Typically, this committal function call is immediately followed by acomputation function call (see DFT Computation).

Return Values

The function returns DFTI_NO_ERROR when completes successfully. If the function fails, itreturns a value of another error class constant (for the list of constants, refer to the Error Codessection).


INTERFACE DftiCommitDescriptorDM

INTEGER(4) FUNCTION DftiCommitDescriptorDM(handle);

TYPE(DFTI_DESCRIPTOR_DM), POINTER :: handle

END FUNCTION

END INTERFACE


long DftiCommitDescriptorDM(DFTI_DESCRIPTOR_DM_HANDLE handle);

2509


FreeDescriptorDMFrees memory allocated for a descriptor.

Syntax

Fortran:

Status = DftiFreeDescriptorDM(handle)

C/C++:

status = DftiFreeDescriptorDM(&handle);

Input Parameters

Valid handle obtained from DftiCreateDescriptorDM.handle

Output Parameters

Descriptor handle. Memory allocated for the handle isreleased on output.

handle

Description

This function frees up all memory allocated for a descriptor in a particular MPI process. Call theDftiFreeDescriptorDM function to delete the descriptor handle. Upon successful completionof DftiFreeDescriptorDM the descriptor handle is no longer valid.

Return Values


2510



INTERFACE DftiFreeDescriptorDM

INTEGER(4) FUNCTION DftiFreeDescriptorDM(handle)

TYPE(DFTI_DESCRIPTOR_DM), POINTER :: handle

END FUNCTION

END INTERFACE


long DftiFreeDescriptorDM(DFTI_DESCRIPTOR_DM_HANDLE *handle);


There are two functions in this category: compute the forward transform and compute thebackward transform.

ComputeForwardDMComputes the forward Discrete Fourier Transform.

Syntax

Fortran:

Status = DftiComputeForwardDM(handle, in_X, out_X)

Status = DftiComputeForwardDM(handle, in_out_X)

C/C++:

status = DftiComputeForwardDM(handle, in_X, out_X);

status = DftiComputeForwardDM(handle, in_out_X);

2511


Input Parameters

Valid descriptor handle.handle

Local part of input data. Array of complex values.Refer to the Distributing Data among Processes sectionon how to allocate and initialize the array.

in_X, in_out_X

Output Parameters

Local part of output data. Array of complex values.Refer to the Distributing Data among Processes sectionon how to allocate the array.

out_X, in_out_X

Description

As soon as a descriptor is configured and committed successfully, actual computation of DFTcan be performed. The DftiComputeForwardDM function computes the forward DFT.

Forward DFT is the transform using the factor e-i2π/n. The computation is carried out by calling

the DftiComputeForward function. So, the functions have very much in common and detailsnot explicitly mentioned below, can be found in the description of DftiComputeForward.

The valid descriptor handle is created by DftiCreateDescriptorDM and committed byDftiCommitDescriptorDM. Configuration parameters that the descriptor handle passes to thefunction are listed in Table 11-21.

Local part of input data, as well as local part of the output data, is an appropriate sequence ofcomplex values (each complex value consists of two real numbers: real part and imaginarypart) that a particular process stores. See the Distributing Data among Processes section fordetails.

The choices for precision of input and output data are the same as those for precision oftransform: a setting of DFTI_SINGLE indicates single-precision floating-point data type and asetting of DFTI_DOUBLE indicates double-precision floating-point data type.

The configuration parameter DFTI_PLACEMENT informs the function whether the computationshould be in-place. If the value of this parameter is DFTI_INPLACE (default), you should callthe function with two parameters, otherwise you should supply three parameters. IfDFTI_PLACEMENT = DFTI_INPLACE and three parameters are supplied, then the third parameterwill be ignored.

2512


CAUTION. Even in case of an out-of-place transform, local array of input data in_Xmay be changed. To save data, make its copy before calling DftiComputeForwardDM.

In case of an in-place transform, DftiComputeForwardDM will dynamically allocate and thendeallocate a work buffer of the same size as the local input/output array requires.

NOTE. You can specify your own workspace of the same size through the configurationparameter CDFT_WORKSPACE to avoid redundant memory allocation.

Return Values


2513



INTERFACE DftiComputeForwardDM

INTEGER(4) FUNCTION DftiComputeForwardDM(h, in_X, out_X)

TYPE(DFTI_DESCRIPTOR_DM), POINTER :: h

COMPLEX(8), DIMENSION(*) :: in_x, out_X

END FUNCTION DftiComputeForwardDM

INTEGER(4) FUNCTION DftiComputeForwardDMi(h, in_out_X)


COMPLEX(8), DIMENSION(*) :: in_out_X

END FUNCTION DftiComputeForwardDMi

INTEGER(4) FUNCTION DftiComputeForwardDMs(h, in_X, out_X)



END FUNCTION DftiComputeForwardDMs

INTEGER(4) FUNCTION DftiComputeForwardDMis(h, in_out_X)



END FUNCTION DftiComputeForwardDMis

END INTERFACE


long DftiComputeForwardDM(DFTI_DESCRIPTOR_DM_HANDLE handle, void *in_X,...);

2514


ComputeBackwardDMComputes the backward Discrete FourierTransform.

Syntax

Fortran:

Status = DftiComputeBackwardDM(handle, in_X, out_X)

Status = DftiComputeBackwardDM(handle, in_out_X)

C/C++:

status = DftiComputeBackwardDM(handle, in_X, out_X);

status = DftiComputeBackwardDM(handle, in_out_X);

Input Parameters


Local part of input data. Array of complex values.Refer to the Distributing Data among Processes sectionon how to allocate and initialize the array.

in_X, in_out_X

Output Parameters

Local part of output data. Array of complex values.Refer to the Distributing Data among Processes sectionon how to allocate the array.

out_X, in_out_X

Description

As soon as a descriptor is configured and committed successfully, actual computation of DFTcan be performed. The DftiComputeBackwardDM function computes the backward DFT.

Backward DFT is the transform using the factor ei2π/n. The computation is carried out by calling

the DftiComputeBackward function. So, the functions have very much in common and detailsnot explicitly mentioned below, can be found in the description of DftiComputeBackward.

The valid descriptor handle is created by DftiCreateDescriptorDM and committed byDftiCommitDescriptorDM. Configuration parameters that the descriptor handle passes to thefunction are listed in Table 11-21.

2515


Local part of input data, as well as local part of the output data, is an appropriate sequence ofcomplex values (each complex value consists of two real numbers: real part and imaginarypart) that a particular process stores. See the Distributing Data among Processes section fordetails.

The choices for precision of input and output data are the same as those for precision oftransform: a setting of DFTI_SINGLE indicates single-precision floating-point data type and asetting of DFTI_DOUBLE indicates double-precision floating-point data type.

The configuration parameter DFTI_PLACEMENT informs the function whether the computationshould be in-place. If the value of this parameter is DFTI_INPLACE (default), you should callthe function with two parameters, otherwise you should supply three parameters. IfDFTI_PLACEMENT = DFTI_INPLACE and three parameters are supplied, then the third parameterwill be ignored.

CAUTION. Even in case of an out-of-place transform, local array of input data in_Xmay be changed. To save data, make its copy before calling DftiComputeBackwardDM.

In case of an in-place transform, DftiComputeBackwardDM will dynamically allocate and thendeallocate a work buffer of the same size as the local input/output array requires.

NOTE. You can specify your own workspace of the same size through the configurationparameter CDFT_WORKSPACE to avoid redundant memory allocation.

Return Values


2516



INTERFACE DftiComputeBackwardDM

INTEGER(4) FUNCTION DftiComputeBackwardDM(h, in_X, out_X)



END FUNCTION DftiComputeBackwardDM

INTEGER(4) FUNCTION DftiComputeBackwardDMi(h, in_out_X)



END FUNCTION DftiComputeBackwardDMi

INTEGER(4) FUNCTION DftiComputeBackwardDMs(h, in_X, out_X)



END FUNCTION DftiComputeBackwardDMs

INTEGER(4) FUNCTION DftiComputeBackwardDMis(h, in_out_X)



END FUNCTION DftiComputeBackwardDMis

END INTERFACE


long DftiComputeBackwardDM(DFTI_DESCRIPTOR_DM_HANDLE handle, void *in_X,...);


There are two functions in this category: the value setting function DftiSetValueDM sets oneparticular configuration parameter to an appropriate value, the value getting functionDftiGetValueDM reads the values of one particular configuration parameter.

2517


Some configuration parameters used by cluster DFT functions originate from the conventionalDFT interface (see Configuration Settings subsection in the “DFT Functions” section for details).

Other parameters are specific for cluster DFT. Integer values of these configuration parametershave type long in C/C++ and INTEGER(4) in Fortran. The exact type of the configurationparameters being floating-point scalars is float or double in C/C++ and REAL(4) or REAL(8)in Fortran. The configuration parameters whose values are identified by named constants havethe enum type in C/C++ and INTEGER in Fortran. They are defined in the mkl_cdft.h headerfile in C/C++ and MKL_CDFT module in Fortran.

Names the CDFT-specific configuration parameters have prefix CDFT.

SetValueDMSets one particular configuration parameter withthe specified configuration value.

Syntax

Fortran:

Status = DftiSetValueDM (handle, param, value)

C/C++:

status = DftiSetValueDM (handle, param, value);

Input Parameters


Name of a parameter to be set up in the descriptorhandle. See Table 11-20 for the list of availablenames.

param

Value of a parameter.value

Description

This function sets one particular configuration parameter with the specified configuration value.The configuration parameter is one of the named constants listed in the table below, and theconfiguration value is the corresponding appropriate type. See Configuration Settings for detailsof the meaning of the setting.

2518


Table 11-20 Settable Configuration Parameters

Defaultvalue

DescriptionType of parameterNamed constant

1.0Scale factor of forwardtransform.

Floating-point scalarDFTI_FORWARD_SCALE

1.0Scale factor of backwardtransform.

Floating-point scalarDFTI_BACKWARD_SCALE

DFTI_INPLACEPlacement of the computationresult.

Named constantDFTI_PLACEMENT

DFTI_ORDEREDScrambling of data order.Named constantDFTI_ORDERING

NULL(allocateworkspacedynamically).

Auxiliary buffer, a user-definedworkspace. Enables savingmemory during in-placecomputations.

Array of anappropriate type

DFTI_WORKSPACE

Return Values


2519



INTERFACE DftiSetValueDM

INTEGER(4) FUNCTION DftiSetValueDM(h, p, v)


INTEGER(4) :: p, v

END FUNCTION

INTEGER(4) FUNCTION DftiSetValueDMd(h, p, v)


INTEGER(4) :: p

REAL(8) :: v

END FUNCTION

INTEGER(4) FUNCTION DftiSetValueDMs(h, p, v)


INTEGER(4) :: p

REAL(4) :: v

END FUNCTION

INTEGER(4) FUNCTION DftiSetValueDMsw(h, p, v)


INTEGER(4) :: p

COMPLEX(4) :: v(*)

END FUNCTION

INTEGER(4) FUNCTION DftiSetValueDMdw(h, p, v)


INTEGER(4) :: p

COMPLEX(8) :: v(*)

END FUNCTION

END INTERFACE

2520



long DftiSetValueDM(DFTI_DESCRIPTOR_DM_HANDLE handle, int param,...);

GetValueDMGets the configuration value of one particularconfiguration parameter.

Syntax

Fortran:

Status = DftiGetValueDM(handle, param, value)

C/C++:

status = DftiGetValueDM(handle, param, &value);

Input Parameters


Name of a parameter to be retrieved from thedescriptor handle. See Table 11-21 for the list ofavailable names.

param

Output Parameters

Value of the parameter.value

Description

This function gets the configuration value of one particular configuration parameter. Theconfiguration parameter is one of the named constants listed in the table below, and theconfiguration value is the corresponding appropriate type, which can be a named constant ora native type. Possible values of the named constants can be found in Table 11-6.

2521


Table 11-21 Retrievable Configuration Parameters

DescriptionType of parameterNamed Constant

Precision of computation, input data andoutput data.

Named constantDFTI_PRECISION

Dimension of the transformInteger scalarDFTI_DIMENSION

Array of lengths of the transform.Number of lengths corresponds to thedimension of the transform.

Array of integer valuesDFTI_LENGTHS

Scale factor of forward transform.Floating-point scalarDFTI_FORWARD_SCALE

Scale factor of backward transform.Floating-point scalarDFTI_BACKWARD_SCALE

Placement of the computation result.Named constantDFTI_PLACEMENT

Shows whether descriptor has beencommitted.

Named constantDFTI_COMMIT_STATUS

Forward domain of transforms (In thecurrent implementation of the cluster DFTinterface, the value is alwaysDFTI_COMPLEX.).

Named constantDFTI_FORWARD_DOMAIN

Scrambling of data order.Named constantDFTI_ORDERING

MPI communicator used for transforms.Type of MPIcommunicator

CDFT_MPI_COMM

Necessary size of input, output, andbuffer arrays in data elements.

Integer scalarCDFT_LOCAL_SIZE

Row/element number of the global arraythat corresponds to the first row/elementof the local array. For more information,see Distributing Data among Processes.

Integer scalarCDFT_LOCAL_X_START

2522


DescriptionType of parameterNamed Constant

The number of rows/elements of theglobal array stored in the local array. Formore information, see Distributing Dataamong Processes.

Integer scalarCDFT_LOCAL_NX

Element number of the appropriate globalarray that corresponds to the firstelement of the input or output local arrayin a 1D case. For details, see DistributingData among Processes.

Integer scalarCDFT_LOCAL_OUT_X_START

The number of elements of theappropriate global array that are storedin the input or output local array in a 1Dcase. For details, see Distributing Dataamong Processes.

Integer scalarCDFT_LOCAL_OUT_NX

Return Values


2523



INTERFACE DftiGetValueDM

INTEGER(4) FUNCTION DftiGetValueDM(h, p, v)


INTEGER(4) :: p, v

END FUNCTION

INTEGER(4) FUNCTION DftiGetValueDMar(h, p, v)


INTEGER(4) :: p, v(*)

END FUNCTION

INTEGER(4) FUNCTION DftiGetValueDMd(h, p, v)


INTEGER(4) :: p

REAL(8) :: v

END FUNCTION

INTEGER(4) FUNCTION DftiGetValueDMs(h, p, v)


INTEGER(4) :: p

REAL(4) :: v

END FUNCTION

END INTERFACE


long DftiGetValueDM(DFTI_DESCRIPTOR_DM_HANDLE handle, int param,...);

2524


Error Codes

All the cluster DFT functions return an integer value denoting the status of the operation. Thesevalues are identified by named constants. Each function returns DFTI_NO_ERROR if no errorswere encountered during execution. Otherwise, a function generates an error code. In additionto DFT error codes, CDFT has its own ones. Names of the CDFT-specific named constants haveprefix “CDFT”. Table 11-22 lists error codes that cluster DFT functions may return.

Table 11-22 Error Codes that Cluster DFT Functions Return


No error.DFTI_NO_ERROR

Usually associated with memory allocation.DFTI_MEMORY_ERROR

Invalid settings of one or more configuration parameters.DFTI_INVALID_CONFIGURATION

Inconsistent configuration or input parameters.DFTI_INCONSISTENT_CONFIGURATION

Number of OMP threads in the computation function is notequal to the number of OMP threads in the initialization stage(commit function).

DFTI_NUMBER_OF_THREADS_ERROR

Usually associated with OMP routine's error return value.DFTI_MULTITHREADED_ERROR

Descriptor is unusable for computation.DFTI_BAD_DESCRIPTOR

Unimplemented legitimate settings; implementationdependent.

DFTI_UNIMPLEMENTED

Internal library error.DFTI_MKL_INTERNAL_ERROR

Length of one of dimensions exceeds 232 -1 (4 bytes).DFTI_1D_LENGTH_EXCEEDS_INT32

Data cannot be distributed (For more information, seeDistributing Data among Processes.)

CDFT_SPREAD_ERROR

MPI error. Occurs when calling MPI.CDFT_MPI_ERROR

2525


12Interval Linear Solvers

Intel® MKL Interval Linear solver routines that can be used for:

– solving systems of interval linear equations Ax = b with an interval matrix A = ( aij) and intervalright-hand side vector b = ( bi);

– checking properties of interval matrices.

For more information on key concepts of interval linear systems, see Appendix A, “Linear Solvers Basics”.

Routines described below are subdivided according to the problems they solve into the following groups:

Routines for Fast Solution of Interval Systems

Routines for Sharp Solution of Interval Systems

Routines for Inverting Interval Matrices

Routines for Checking Properties of Interval Matrices

Auxiliary and Utility Routines

Table 12-1 contains the full list of Intel MKL routines for solving interval linear systems.

Table 12-1 Intel MKL Interval Linear Solver Routines

DescriptionRoutine Name

Solves a triangular system of interval linear equations by backward substitution procedure.?trtrsSolves a system of interval linear equations by interval Gauss method.?gegasSolves a system of interval linear equations by interval Householder method.?gehssSolves a system of interval linear equations by Krawczyk iteration method.?gekwsSolves a system of interval linear equations by interval Gauss-Seidel iteration.?gegssSolves a system of interval linear equations by Hansen-Bliek-Rohn procedure.?gehbsSolves a system of interval linear equations by a parameter partitioning method.?geppsSolves a system of interval linear equations by a solution partitioning method.?gepssComputes inverse interval matrix to a triangular interval matrix.?trtriComputes inverse interval matrix by Schulz interval iterative procedure.?gesziTests regularity of an interval matrix by Ris-Beeck and Rex-Rohn criteria?gerbrTests regularity/singularity of an interval matrix by Rump and Rex-Rohn singular value criteria.?gesvrPerforms midpoint-inverse preconditioning of an interval linear system.?gemip

2527

Routine Naming ConventionsFor the routines introduced below, the LAPACK-like naming conventions are used. Specifically,all the routine names have the structure xxyyzzz, where the first letters xx indicate the datatypes:

real interval, single precisionsi

real interval, double precisiondi

complex rectangular interval, single precisioncr

complex rectangular interval, double precisionzr

complex circular interval, single precisioncc

complex circular interval, double precisionzc

The third and fourth letters yy indicate the matrix type:

generalge

triangulartr

The last three letters zzz indicate the computational procedure performed by the routine:

backward substitution solver for triangular interval linear systemstrs

interval Gauss solver for interval linear systemsgas

interval Householder solver for interval linear systemshss

iterative Krawczyk solver for interval linear systemskws

interval Gauss-Seidel iteration solver for interval linear systemsgss

Hansen-Bliek-Rohn solver for interval linear systemshbs

parameter partitioning method-based solver for interval linear systemspps

solution partitioning method-based solver for interval linear systemspss

inverting triangular interval matrix based on backward substitutiontri

inverting general interval matrix by Schulz iterative methodszi

testing regularity/singularity of interval matrix by Ris-Beeck criterionrbr

testing regularity/singularity of interval matrix by Rump and Rex-Rohnsingular value criteria

svr

midpoint-inverse preconditioning of interval linear systemmip

2528


The question mark in the routine group name corresponds to different character codes indicatingthe data type (si, di, cr, zr, cc, or zc). For example, ?trtri denotes the group name foreither of the routines sitrtri, ditrtri, crtrtri, zrtrtri, cctrtri, or zctrtri.

Routines for Fast Solution of Interval Systems

?trtrsSolves a triangular system of interval linearequations by backward substitution procedure.

Syntax

call sitrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

call ditrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

call crtrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

call zrtrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

call cctrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

call zctrtrs(uplo, trans, diag, n, nrhs, a, lda, b, ldb, info)

Description

The routine ?trtrs solves for X the following systems of interval linear equations with atriangular matrix A and multiple right-hand sides stored in B:

AX = B, if trans = 'N'

ATX = B, if trans = 'T'

AHX = B, if trans = 'C' (for complex matrices only).

The routine implements backward substitution algorithm and produces optimal enclosures ofthe solution sets to interval linear systems, which is due to the simple structure of the matrixA.

Input Parameters

CHARACTER(1). Must be one of 'U' , 'L' , 'u', or 'l'.uploIndicates whether A is upper or lower triangular.If uplo = 'U' or 'u', then A is upper triangular.

2529

Interval Linear Solvers 12

If uplo = 'L' or 'l', then A is lower triangular.

CHARACTER(1). Must be one of 'N', 'T', 'C', 'n', 't', or 'c'.transIf trans = 'N' or 'n', then AX = B is solved for X.If trans = 'T' or 't', then ATX = B is solved for X.If trans = 'C' or 'c', then AHX = B is solved for X.

CHARACTER(1). Must be one of 'N', 'U', 'n', or 'u'.diagIf diag = 'N' or 'n', then A is not a unit triangular matrix.If diag = 'U' or 'u', then A is unit triangular: diagonal elements ofA are assumed to be 1 and not referenced in the array a.

INTEGER. The order of A, the number of rows in B (n≥0).n

INTEGER. The number of right-hand sides (nrhs≥0).nrhs

S_INTERVAL for sitrtrs.a, bD_INTERVAL for ditrtrs.CR_INTERVAL for crtrtrs.ZR_INTERVAL for zrtrtrs.CC_INTERVAL for cctrtrs.ZC_INTERVAL for zctrtrs.Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.The array b contains the matrix B , whose columns are the right-handsides for the systems of equations. The second dimension of a mustbe at least max(1,n) and the second dimension of b must be at leastmax(1,nrhs).

INTEGER. The first dimension of a, lda ≥ max(1, n).lda

INTEGER. The first dimension of b, ldb ≥ max(1, n).ldb

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info > 0, the execution is not successful.If info = -i, the i-th parameter has an illegal value.

2530


?gegasSolves a system of interval linear equations byinterval Gauss method.

Syntax

call sigegas(trans, n, nrhs, a, lda, b, ldb, info)

call digegas(trans, n, nrhs, a, lda, b, ldb, info)

call crgegas(trans, n, nrhs, a, lda, b, ldb, info)

call zrgegas(trans, n, nrhs, a, lda, b, ldb, info)

call ccgegas(trans, n, nrhs, a, lda, b, ldb, info)

call zcgegas(trans, n, nrhs, a, lda, b, ldb, info)

Description

The routine ?gegas uses the interval Gauss method to compute an enclosure of the solutionset to the following interval linear system of equations:




Input Parameters

CHARACTER(1). Must be one of 'N', 'T', 'C', 'n', 't', or 'c'.transIndicates the form of the equations system:If trans = 'N' or 'n', then AX = B is solved for X.If trans = 'T' or 't', then ATX = B is solved for X.If trans = 'C' or 'c', then AHX = B is solved for X.

INTEGER. The order of A, the number of rows in B(n ≥ 0).n


REAL for sigegas.a, bDOUBLE PRECISION for digegas.Arrays: a (lda,*), b (ldb,* ).The array a contains the matrix A.

2531


The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n) and the seconddimension of b must be at least max(1,nrhs).



Output Parameters



Example 12-1 Fortran 90 Code for Interval Gauss Method

The following piece of Fortran code presents an example of using the routine digegas tocompute, by an interval Gauss method, an enclosure of the solution set to the interval linearsystem of equations:

2532


----------------------------------------------------------------------------------------------

. . . . . .

USE INTERVAL_ARITHMETIC

. . . . . .

TYPE(D_INTERVAL) :: A(2,2), B(2)

INTEGER :: N, INFO

CHARACTER(1) :: TRANS = 'n'

. . . . . .

N = 2

A(1,1) = DINTERVAL(2.,3.); A(1,2) = DINTERVAL(0.,1.)

A(2,1) = DINTERVAL(1.,2.); A(2,2) = DINTERVAL(2.,3.)

B(1,1) = DINTERVAL(0.,120.); B(2,1) = DINTERVAL(60.,240.)

. . . . . .

CALL DIGEGAS( TRANS, N, 1, A, 2, B, 2, INFO )

----------------------------------------------------------------------------------------------

Note that assigning double-precision intervals to the entries of the matrix A and right-hand sidevector B is carried out by DINTERVAL function supplied by INTERVAL_ARITHMETIC module.

?gehssSolves a system of interval linear equations byinterval Householder method.

Syntax

call sigehss(trans, n, nrhs, a, lda, b, ldb, info)

call digehss(trans, n, nrhs, a, lda, b, ldb, info)

Description

The routine ?gehss uses the interval Householder method to compute an enclosure of thesolution set to the following interval linear system of equations:

AX = B, if trans = 'N',

2533


ATX = B, if trans = 'T' or 'C'.

Input Parameters

CHARACTER(1). Must be one of 'N', 'T', 'C', 'n', 't', or 'c'.transIndicates the form of the equations system:If trans = 'N' or 'n', then AX = B is solved for X.If trans = 'T' or 'C' or 't'or 'c', then ATX = B is solved for X.

INTEGER. The order of A, the number of rows in B (n ≥ 0).n


REAL for sigehss.a, bDOUBLE PRECISION for digehss.Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n) and the seconddimension of b must be at least max(1,nrhs).



Output Parameters

Overwritten by an enclosure of the solution matrix X.b


2534


?gekwsSolves a system of interval linear equations byKrawczyk iteration method.

Syntax

call sigekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

call digekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

call crgekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

call zrgekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

call ccgekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

call zcgekws(trans, n, nrhs, a, lda, b, ldb, epsilon, info)

Description

The routine ?gekws uses the Krawczyk interval iteration to compute an enclosure of the solutionset to the following interval linear system of equations:




Input Parameters




S_INTERVAL for sigekws.a, bD_INTERVAL for digekws.CR_INTERVAL for crgekws.ZR_INTERVAL for zrgekws.

2535


CC_INTERVAL for ccgekws.ZC_INTERVAL for zcgekws.Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.



REAL for sigekws, crgekws, and ccgekws.epsilonDOUBLE PRECISION for digekws, zrgekws, and zcgekws.The prescribed accuracy of the estimate.

Output Parameters



Application Notes

Krawczyk interval iteration already incorporates midpoint inverse preconditioning, so thatadditional application of ?gemip routines is not necessary and does not improve the overallefficiency.

2536


?gegssSolves a system of interval linear equations byinterval Gauss-Seidel iteration.

Syntax

call sigegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

call digegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

call crgegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

call zrgegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

call ccgegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

call zcgegss(trans, n, nrhs, a, lda, b, ldb, encl, epsilon, nits, info)

Description

The routine ?gegss uses the interval Gauss-Seidel iteration to compute an enclosure of aportion of the solution set to the following interval linear system of equations:




See in Appendix C of this manual for example code on using this routine.

Input Parameters




S_INTERVAL for sigegss.a, bD_INTERVAL for digegss.CR_INTERVAL for crgegss.

2537


ZR_INTERVAL for zrgegss.CC_INTERVAL for ccgegss.ZC_INTERVAL for zcgegss.Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.

INTEGER. The first dimension of a, lda≥ max(1, n).lda

INTEGER. The first dimension of b, ldb≥ max(1, n).ldb

S_INTERVAL for sigegss.enclD_INTERVAL for digegss.CR_INTERVAL for crgegss.ZR_INTERVAL for zrgegss.CC_INTERVAL for ccgegss.ZC_INTERVAL for zcgegss.Array: encl (ldb,*).The array encl defines the interval box bounding the portion of thesolution set that the routine estimates.

REAL for sigegss, crgegss, and ccgegss.epsilonDOUBLE PRECISION for digegss, zrgegss, and zcgegss.The prescribed accuracy of the esimate.

INTEGER. The number of Gauss-Seidel iterations alloted, nits ≥ 0.nits

Output Parameters


INTEGER.infoIf info = 0, the execution is successful.If info= i > 0, then the diagonal element a(i,i) of the matrixcontains zero. The execution of the routine did not fail, but it isrecommended to interchange the rows and/or columns of the matrixso as to exclude zero-containing elements from its main diagonal.If info = -i, the i-th parameter has an illegal value.

2538


Application Notes

Interval Gauss-Seidel iteration is a local solver of interval linear systems, which means thatit is mainly intended for computing enclosures of portions of the solution set bounded by agiven interval box in the space Rn.

If the goal is to compute, by ?gegss, an enclosure of the entire solution set to an interval linearsystem of equations, then its initial (crude) enclosure should be provided through enclargument.

?gehbsSolves a system of interval linear equations byHansen-Bliek-Rohn procedure.

Syntax

call sigehbs(trans, n, a, lda, b, ldb, info)

call digehbs(trans, n, a, lda, b, ldb, info)

Description

The routine ?gehbs uses the Hansen-Bliek-Rohn procedure to compute an enclosure of thesolution set to the following interval linear system of equations:




Input Parameters



REAL for sigehbs.a, bDOUBLE PRECISION for digehbs.Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.

2539


The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.



Output Parameters



Application Notes

If the middle matrix of A is not close to a diagonal matrix, then the midpoint inversepreconditioning by ?gemip routine may be necessary to correct the interval linear system andyield better results.

Routines for Sharp Solution of Interval Systems

?geppsSolves a system of interval linear equations by aparameter partitioning method.

Syntax

call sigepps( trans, n, a, lda, b, ldb, cmpt, mode, estm, epsilon, nits, info)

call digepps( trans, n, a, lda, b, ldb, cmpt, mode, estm, epsilon, nits, info)

Description

The routine ?gepps uses the parameter partitioning (PPS) method to compute some (or all) ofthe sharp outer component-wise estimates of the solution set to the following interval linearsystem of equations:


2540



Input Parameters



REAL for sigepps.a, bDOUBLE PRECISION for digepps.Arrays: a (lda,*), b (ldb).The array a contains the matrix A.The array b contains the vector B of the right-hand sides for the systemof equations.



INTEGER. The number of the component of the solution set to beestimated.

cmpt

CHARACTER(1). Must be either 'L' or 'U'(or the correspondinglowercase letters).

mode

Indicates how to estimate the solution set along the coordinate directionspecified by the parameter cmpt:if mode = 'L' or 'l', then the routine computes the lower estimateof the solution set over the cmpt-th coordinate;if mode = 'U' or 'u', then the routine computes the upper estimateof the solution set over the cmpt-th coordinate.

REAL for sigepps.epsilonDOUBLE PRECISION for digepps.The prescribed accuracy of the estimate.

INTEGER. The number of iterations of the PPS algorithm alloted, nits

≥ 0.

nits

Output Parameters

REAL for sigepps.estm

2541


DOUBLE PRECISION for digepps.Estimate of the solution set along the coordinate axis with the numbercmpt. if mode = 'L', then estm represents the lower estimate of thesolution set. if mode = 'U', then estm is equal to the upper estimateof the solution set.

The actual precision of the estimate.epsilon

The number of iterations that the algorithm actually executed.nits

INTEGER.infoIf info = 0, the execution is successful.If info = i > 0, the execution is not successful.If info = -i, the i-th parameter has an illegal value.

Application Notes

Routines ?gepps and ?gepss implement two mutually dual algorithms for computing optimalestimates of the solution sets to interval linear systems, which use adaptive partitioning ofeither parameter set or solution set of the system. The choice of either of these methods shouldbe based on the specific features of the problem under solution such as the number of essentiallyinterval parameters, the shape of the solution set, and so on.

For example, if the interval system has few interval parameters, then PPS-method is preferable,while PSS-method works better for systems that have simple shape of the solution set.

Computing optimal (or sharp) enclosures of the solution sets to interval linear systems, as wellas enclosures that are guaranteed to be sharp within a prescribed accuracy, is known to be anNP-hard problem.

Therefore, a good choice of the parameters epsilon and nits becomes crucial for effectivework of ?gepps and ?gepss routines and for producing desired results.

With this in mind, the recommendation is to organize the whole solution process with ?geppsor ?gepss interactively as a sequence of routine calls, starting from a moderate nits and arough epsilon and then increasing nits and reducing epsilon until ?gepps or ?gepss stillcomplete their execution.

2542


?gepssSolves a system of interval linear equations by asolution partitioning method.

Syntax

call sigepss( trans, n, a, lda, b, ldb, cmpt, mode, estm, epsilon, nits, info)

call digepss( trans, n, a, lda, b, ldb, cmpt, mode, estm, epsilon, nits, info)

Description

The routine ?gepss uses the (PSS) method to compute a sharp outer component-wise estimate,along a prescribed coordinate axis, of the solution set to the following interval linear system ofequations:



Input Parameters


INTEGER. The order of A, the number of rows in B (n≥0).n

REAL for sigepss.a, bDOUBLE PRECISION for digepss.Arrays: a (lda,*), b (ldb).The array a contains the matrix A.The array b contains the vector B of the right-hand sides for the systemof equations to be solved.

INTEGER. The first dimension of a, lda≥ max(1, n).lda

INTEGER. The first dimension of b, ldb≥ max(1, n).ldb

INTEGER. The number of the component of the solution set to beestimated.

cmpt

CHARACTER(1). Must be either 'L' or 'U'(or the correspondinglowercase letters).

mode

2543


Indicates how to estimate the solution set along the coordinate directionspecified by the parameter cmpt:if mode = 'L' or 'l', then the routine computes the lower estimateof the solution set over the cmpt-th coordinate;if mode = 'U' or 'u', then the routine computes the upper estimateof the solution set over the cmpt-th coordinate.

REAL for sigepss.epsilonDOUBLE PRECISION for digepss.The prescribed accuracy of the esimate for the solution set.

INTEGER. The number of iterations of the PSS algorithm alloted, nits≥0.nits

Output Parameters

REAL for sigepss.estmDOUBLE PRECISION for digepss.Estimate of the solution set along the coordinate axis with the numbercmpt. if mode = 'L', then estm represents the lower estimate of thesolution set. if mode = 'U', then estm is equal to the upper estimateof the solution set.

The actual precision of the estimate estm.epsilon

INTEGER. The number of iterations that the algorithm actually executed.nits

INTEGER.infoIf info = 0, the execution is successful.If info = i > 0, the execution is not successful.If info = -i, the i-th parameter has an illegal value and the routineoutputs the corresponding message.

Application Notes

Routines ?gepps and ?gepss implement two mutually dual algorithms for computing optimalestimates of the solution sets to interval linear systems, which use adaptive partitioning ofeither parameter set or solution set of the system. The choice of either of these methods shouldbe based on the specific features of the problem under solution such as the number of essentiallyinterval parameters, the shape of the solution set, and so on.

For example, if the interval system has few interval parameters, then PPS-method is preferable,while PSS-method works better for systems that have simple shape of the solution set.

2544


Computing optimal (or sharp) enclosures of the solution sets to interval linear systems, as wellas enclosures that are guaranteed to be sharp within a prescribed accuracy, is known to be anNP-hard problem.

Therefore, a good choice of the parameters epsilon and nits becomes crucial for effectivework of ?gepps and ?gepss routines and for producing desired results.

With this in mind, the recommendation is to organize the whole solution process with ?geppsor ?gepss interactively as a sequence of routine calls, starting from a moderate nits and arough epsilon and then increasing nits and reducing epsilon until ?gepps or ?gepss stillcomplete their execution.

Example 12-2 Fortran 90 Code for Parameter Partitioning (PPS)Method

Consider a sample problem that requires computing a sharp lower estimate (to within theaccuracy of, say, 1.E-4), along the first coordinate direction, of the solution set to the intervallinear system

The problem can be solved by the following Fortran code that implements parameter partitioningmethod (PPS-method) and uses sigepps routine:

2545


----------------------------------------------------------------------------------------------

. . . . . .


. . . . . .

INTEGER, PARAMETER :: LDA = 3, LDB = 3

INTEGER :: NITS, CMPT, INFO, I, J

CHARACTER(1) :: MODE = 'L'

REAL(4) :: EPS, ESTM

TYPE(S_INTERVAL) :: A(3,3), B(3)

. . . . . .

DO I = 1, 3

DO J = 1, 3

IF( I/=J ) THEN

A(I,J) = SINTERVAL(0.,2.)

ELSE

A(I,J) = SINTERVAL(3.5)

END IF

B(I) = SINTERVAL(-1.,1.)

END DO

END DO

CMPT = 1

NITS = 100

EPS= 1.E-4

. . . . . .

CALL SIGEPPS( 'n', 3, A, LDA, B, LDB, CMPT, MODE, ESTM, EPS, NITS, INFO )

----------------------------------------------------------------------------------------------

2546


To guarantee the completion of the algorithm, the value of the parameter NITS is set equal to100 (iterations), which is enough for the above specific example. Note that assigningsingle-precision intervals to the entries of the matrix A and right-hand side vector B is carriedout by SINTERVAL function supplied by INTERVAL_ARITHMETIC module.

Routines for Inverting Interval Matrices

?trtriComputes inverse interval matrix to a triangularinterval matrix.

Syntax

call sitrtri(uplo, diag, n, a, lda, info)

call ditrtri(uplo, diag, n, a, lda, info)

call crtrtri(uplo, diag, n, a, lda, info)

call zrtrtri(uplo, diag, n, a, lda, info)

call cctrtri(uplo, diag, n, a, lda, info)

call zctrtri(uplo, diag, n, a, lda, info)

Description

The routine ?trtri computes an interval enclosure of the inverse A-1 of an interval triangularmatrix A.

This routine implements a backward substitution algorithm and produces optimal enclosuresof the inverse interval matrix, which is due to the simple structure of the matrix to be inverted.

Input Parameters

CHARACTER(1). Must be one of 'U' , 'L' , 'u', or 'l'.uploIndicates whether A is upper or lower triangular.If uplo = 'U' or 'u', then A is upper triangular.If uplo = 'L' or 'l', then A is lower triangular.

CHARACTER(1). Must be one of 'N', 'U', 'n', or 'u'.diagIf diag = 'N' or 'n', then A is not a unit triangular matrix.

2547


If diag = 'U' or 'u', then A is unit triangular: diagonal elements ofA are assumed to be 1 and not referenced in the array a.


S_INTERVAL for sitrtri.aD_INTERVAL for ditrtri.CR_INTERVAL for crtrtri.ZR_INTERVAL for zrtrtri.CC_INTERVAL for cctrtri.ZC_INTERVAL for zctrtri.Array: DIMENSION (lda,*).Contains the matrix A.The second dimension of a must be at least max(1, n).


Output Parameters

Overwritten by an interval n-by-n matrix that encloses the inversematrix A-1.

a

INTEGER.infoIf info = 0, the execution was successful.If info = -i, the i-th parameter has an illegal value.If info = i, the i-th diagonal element of A contains zero, A is singular,and its inversion could not be completed.

?gesziComputes inverse interval matrix by Schulziterative method.

Syntax

call sigeszi(n, a, lda, info)

call digeszi(n, a, lda, info)

Description

For a general interval square matrix A, the routine ?geszi computes an enclosure of the inverseinterval matrix A-1 by the Schulz iterative method.

2548



Input Parameters


REAL for sigeszi.aDOUBLE PRECISION for digeszi.Array: DIMENSION (lda,*).Contains the matrix A.The second dimension of a must be at least max(1, n).


Output Parameters

Overwritten by an enclosure of the inverse interval matrix A-1.a


Application Notes

Schulz iteration implemented in ?geszi routine converges only provided that the interval matrixA is not “too wide”. Otherwise, when Schulz iteration diverges, the result is set equal to theinterval matrix with the elements [ -infty,+infty ], where infty is computer infinity of thecorresponding kind.

Routines for Checking Properties of Interval Matrices

?gerbrTests regularity of an interval matrix by Ris-Beeckand Rex-Rohn criteria.

Syntax

call sigerbr( n, a, lda, sr, reg, info )

call digerbr( n, a, lda, sr, reg, info )

2549


Description

The routine ?gerbr checks whether a general interval square matrix A is regular or singularby using a combination of Ris-Beeck spectral criterion and Rex-Rohn test.

Input Parameters


REAL for sigerbr.aDOUBLE PRECISION for digerbr.Array: DIMENSION (lda,*).Contains the matrix A.The second dimension of a must be at least max(1, n).


Output Parameters

REAL for sigerbr.srDOUBLE PRECISION for digerbr.An upper estimate of the spectral radius of the matrix (|(mid A)-1|· rad A).This is an additional information about the matrix A, which is crucialfor the so-called strong regularity of A.

INTEGER. Displays the results of the singularity test.regIf reg > 0, then A is regular.If reg < 0, then A is singular.If reg = 0, then the result is undetermined, that is, the test was notsufficiently sensitive to detect whether the matrix A is regular orsingular.


Application Notes

The test implemented in the routine ?gerbr is rather crude, and in critical cases furtherinvestigation of the matrix A is recommended. However, the routine may help to determine(by comparing the value of sr with 1) whether an interval matrix is strongly regular or not.

2550

?gesvrTests regularity/singularity of an interval matrixby Rump and Rex-Rohn singular value criteria.

Syntax

call sigesvr(n, a, lda, msr, rsr, reg, info)

call digesvr(n, a, lda, msr, rsr, reg, info)

Description

The routine ?gesvr checks whether a general interval square matrix A is regular or singularby using Rump and Rex-Rohn singular value criteria.

Input Parameters


REAL for sigesvr.aDOUBLE PRECISION for digesvr.Array: DIMENSION (lda,*).Contains the matrix A.The second dimension of a must be at least max(1, n).


Output Parameters

S_INTERVAL for sigesvr.msr, rsrD_INTERVAL for digesvr.Additional information about the matrix A.The intervals represent the ranges of the singular spectra of themidpoint matrix and radius matrix, respectively.

INTEGER. Displays results of the singularity test.regIf reg > 0, then A is regular.If reg < 0, then A is singular.If reg = 0, then the result is undetermined, that is, the test was notsufficiently sensitive to detect whether the matrix A is regular orsingular.

INTEGER.info

2551

If info = 0, the execution is successful.If info = i > 0, the execution is not successful.If info = -i, the i-th parameter has an illegal value.

Application Notes

The routine ?gesvr implements a test that is only a sufficient condition for a matrix to be eitherregular or singular. This means that in some boundary cases the test may prove not sensitiveenough to determine whether a given matrix is regular or singular, and the routine returns reg= 0 on output.

Example 12-3 Fortran 90 Code for Testing Regularity of Interval Matrixby Singular Value Criteria

To test regularity of the interval matrix

by singular value criteria, the following piece of Fortran 90 code may be helpful:

----------------------------------------------------------------------------------------------

. . . . . .


. . . . . .

INTEGER, PARAMETER :: LDA = 2, N = 2

TYPE(D_INTERVAL) :: A(LDA,N), MSR, RSR

INTEGER :: REG, INFO

. . . . . .

A(1,1) = DINTERVAL(2.,4.); A(1,2) = DINTERVAL(-2.,1.)

A(2,1) = DINTERVAL(-1.,2.); A(2,2) = DINTERVAL(2.,4.)

. . . . . .

CALL DIGESVR( N, A, LDA, MSR, RSR, REG, INFO )

----------------------------------------------------------------------------------------------

2552


Mutual disposition of the intervals MSR and RSR on the real axis can serve to some extent as ameasure of how large the regularity margin is (in case of MSR > RSR), or how far the matrix isfrom the regular ones (in case of MSR < RSR).

Auxiliary and Utility Routines

?gemipPerforms midpoint-inverse preconditioning of aninterval linear system.

Syntax

call sigemip(n, nrhs, a, lda, b, ldb, info)

call digemip(n, nrhs, a, lda, b, ldb, info)

call crgemip(n, nrhs, a, lda, b, ldb, info)

call zrgemip(n, nrhs, a, lda, b, ldb, info)

call ccgemip(n, nrhs, a, lda, b, ldb, info)

call zcgemip(n, nrhs, a, lda, b, ldb, info)

Description

The routine ?gemip performs midpoint-inverse preconditioning of the interval linear system AX= B. This is done through multiplying both matrices A and B by the midpoint-inverse matrix(mid A)-1 in computer (rounded) interval arithmetic.

Input Parameters



S_INTERVAL for sigemip.a, bD_INTERVAL for digemip.CR_INTERVAL for crgemip.ZR_INTERVAL for zrgemip.CC_INTERVAL for ccgemip.ZC_INTERVAL for zcgemip.

2553

Arrays: a (lda,*), b (ldb,*).The array a contains the matrix A.The array b contains the matrix B, whose columns are the right-handsides for the systems of equations.The second dimension of a must be at least max(1,n) and the seconddimension of b must be at least max(1,nrhs).



Output Parameters

Overwritten by the preconditioned matrix A.a

Overwritten by the preconditioned matrix B.b


Application Notes

Preconditioning may sometimes extend applicability of the algorithms for the solution of intervallinear systems and/or improve the quality of the results produced by these algorithms.

In particular, interval Gauss method, interval Householder method and interval Gauss-Seideliteration applied to interval linear systems with “wide” matrices that are not diagonally dominantshould be preceded by preconditioning to yield better results. For Hansen-Bliek-Rohn procedure,the midpoint-inverse preconditioning is recommended if the middle matrix of the system is farfrom diagonal.

Example 12-4 Fortran 90 Code for Preconditioning Interval LinearSystem

The following piece of Fortran code presents an example of how you can perform preconditioningof the interval linear system

2554


and then solve it by using the interval Gauss method:

----------------------------------------------------------------------------------------------

. . . . . .


. . . . . .

TYPE(D_INTERVAL) :: A(2,2), B(2,2)

INTEGER :: N = 2, NRHS = 2, LDA = 2, LDB = 2, INFO

CHARACTER(1) :: TRANS = 'n'

. . . . . .



B(1,1) = DINTERVAL(0.,2.); B(2,1) = DINTERVAL(0.,2.)

B(1,2) = DINTERVAL(-2.,2.); B(2,2) = DINTERVAL(-2.,2.)

. . . . . .

CALL DIGEMIP( N, NRHS, A, LDA, B, LDB, INFO )

CALL DIGEGAS( TRANS, N, NRHS, A, LDA, B, LDB, INFO )

----------------------------------------------------------------------------------------------

For more code examples on using this routine, see in Appendix C of this manual .

2555


13Partial Differential EquationsSupport

Intel® Math Kernel Library (Intel® MKL) provides tools for solving Partial Differential Equations (PDE).These tools are Trigonometric Transform interface routines (see Trigonometric Transform Routines) andPoisson Library (see Poisson Library Routines).

Poisson Library is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems. TheTrigonometric Transform interface, which underlies the solver, is, in turn, based on Intel MKL DFT interface(refer to Fourier Transform Functions), optimized for Intel® processors.

Direct use of the Trigonometric Transform routines may be helpful to those who have already implementedtheir own solvers similar to the one that the Poisson Library provides. As it may be hard enough to modifythe original code so as to make it work with Poisson Library, you are encouraged to use fast sine, cosine,and staggered cosine transforms implemented in the Trigonometric Transform interface to improveperformance of your solver.

Both Trigonometric Transform and Poisson Library routines can be called from C and Fortran-90, althoughthe interfaces description uses C convention. Fortran-90 users can find routine calls specifics in the “CallingPDE Support Routines from Fortran-90” section.

Trigonometric Transform RoutinesIn addition to Discrete Fourier Transform (DFT) interface, described in chapter “ Fast FourierTransforms”, Intel® MKL supports the Real Discrete Trigonometric Transforms interface referred toas TT interface. The interface implements a group of routines (TT routines) used to compute sine,cosine, and staggered cosine transforms. TT interface provides much flexibility of use: you can adjustroutines to your particular needs at the cost of manual tuning routine parameters or just call routineswith default parameter values. Current Intel MKL implementation of TT interface can be used insolving partial differential equations and contains routines that are helpful for Fast Poisson and similarsolvers.

To describe Intel MKL TT interface, C convention will be used. Fortran users should refer to CallingPDE Support Routines from Fortran-90.

For the list of Trigonometric Transforms currently implemented in Intel MKL TT interface, seeTransforms Implemented.

Transforms Implemented

TT routines allow computing the following transforms:

Forward sine transform

2557

Backward sine transform

Forward cosine transform

Backward cosine transform

Forward staggered cosine transform

Backward staggered cosine transform

2558


NOTE. The size of the transform n must be even. Current implementation ofTrigonometric Transforms does not support transforms of odd size.

Sequence of Invoking TT Routines

Computation of a transform using TT interface is conceptually divided into four steps each ofwhich is performed via a dedicated routine. Table 13-1 lists names of the routines and brieflydescribes their purpose and use.

Most of TT routines have versions operating with single-precision and double-precision data.Names of such routines begin respectively with “s” and “d”. The wildcard “?” stands for eitherof these symbols in routine names.

Table 13-1 TT Interface Routines

DescriptionRoutine

Initializes basic data structures of TrigonometricTransforms.

?_init_trig_transform

Checks consistency and correctness of user-defineddata as well as creates a data structure to be usedby Intel MKL DFT interface1.

?_commit_trig_transform

Computes a forward/backward TrigonometricTransform of a specified type using the appropriateformula (see Transforms Implemented).

?_forward_trig_transform

?_backward_trig_transform

Cleans the memory used by a data structure neededfor calling DFT interface1.

free_trig_transform

1TT routines call Intel MKL DFT interface for better performance.

To find once a transformed vector for a particular input vector, the Intel MKL TT interfaceroutines are normally invoked in the order in which they are listed in Table 13-1.

NOTE. Though the order of invoking TT routines may be changed, it is highlyrecommended to follow the above order of routine calls.

2559

Partial Differential Equations Support 13

The diagram in Figure 13-1 indicates the typical order in which TT interface routines can beinvoked in a general case (prefixes and suffixes in routine names are omitted).

Figure 13-1 Typical Order of Invoking TT Interface Routines

A general scheme of using TT routines for double-precision computations is shown below.Similar scheme holds for single-precision computations with the only difference in the initialletter of routine names....

d_init_trig_transform(&n, &tt_type, ipar, dpar, &ir);

/* Change parameters in ipar if necessary. */

/* Note that the result of the Transform will be in f ! If you want topreserve the data stored in f,

save them before this place in your code */

d_commit_trig_transform(f, &handle, ipar, dpar, &ir);

d_forward_trig_transform(f, &handle, ipar, dpar, &ir);

d_backward_trig_transform(f, &handle, ipar, dpar, &ir);

free_trig_transform(&handle, ipar, &ir);

/* here the user may clean the memory used by f, dpar, ipar */

...

2560


You can find examples of Fortran-90 and C code that use TT interface routines to solveone-dimensional Helmholtz problem in the “Trigonometric Transform Code Examples” sectionin Appendix C.

Interface Description

All types in this documentation are standard C types: INT, FLOAT, and DOUBLE. Fortran-90users can call the routines with INTEGER, REAL, and DOUBLE PRECISION Fortran types,respectively (see examples in the “Trigonometric Transform Code Examples” section in AppendixC).

Routine Options

All TT routines have parameters that are used for passing various options to the routines. Theseparameters are arrays ipar, dpar and spar. Values for these parameters should be specifiedvery carefully (see Common Parameters). You can change these values during computationsto meet your needs.

NOTE. You must provide correct and consistent parameters to the routines to avoidfailure or wrong results.

User Data Arrays

TT routines take arrays of user data as input. For example, user arrays are passed to the routined_forward_trig_transform to compute a forward Trigonometric Transform. To minimizestorage requirements and improve the overall run-time efficiency, Intel MKL TT routines do notmake copies of user input arrays.

NOTE. If you need a copy of your input data arrays, you should save them yourself.

TT Routines

The section gives detailed description of TT routines, their syntax, parameters and values theyreturn. Double-precision and single-precision versions of the same routine are described together.

TT routines call Intel MKL DFT interface (described in section “DFT Functions” in chapter “FastFourier Transforms”), which enhances performance of the routines.

2561


?_init_trig_transformInitializes basic data structures of a TrigonometricTransform.

Syntax

void d_init_trig_transform (int *n, int *tt_type, int ipar[], double dpar[],int *stat);

void s_init_trig_transform (int *n, int *tt_type, int ipar[], float spar[],int *stat);

Input Parameters

int*. Contains the size of the problem, which should be aneven positive integer. Note that data vector of the transform,which other TT routines will use, must have size n+1.

n

int*. Contains the type of transform to compute, definedvia a set of named constants. The following constants areavailable in the current implementation of TT interface:MKL_SINE_TRANSFORM, MKL_COSINE_TRANSFORM andMKL_STAGGERED_COSINE_TRANSFORM.

tt_type

Output Parameters

int array of size 128. Contains integer data needed forTrigonometric Transform computations.

ipar

double array of size 3n/2+1. Contains double-precisiondata needed for Trigonometric Transform computations.

dpar

float array of size 3n/2+1. Contains single-precision dataneeded for Trigonometric Transform computations.

spar

int*. Contains the routine completion status, which is alsowritten to ipar[6]. The status should be 0 to proceed toother TT routines.

stat

Description

The routine initializes basic data structures for Trigonometric Transforms of appropriate precision.After a call to ?_init_trig_transform, all subsequently invoked TT routines use values ofipar and dpar (spar) array parameters returned by ?_init_trig_transform. The routine

2562


initializes the entire array ipar. In the dpar or spar array, ?_init_trig_transform initializeselements that do not depend upon the type of transform. For detailed description of arraysipar, dpar and spar, refer to the Common Parameters section.You can skip calling theinitialization routine in your code. For more information, see Caveat on Parameter Modifications.

Return Values

The routine successfully completed the task.In general, to proceed with computations, theroutine should complete with this stat value.

stat= 0

The routine failed to complete the task.stat= -99999

?_commit_trig_transformChecks consistency and correctness of user’s dataas well as initializes certain data structures requiredto perform the Trigonometric Transform.

Syntax

void d_commit_trig_transform (double f[], DFTI_DESCRIPTOR_HANDLE *handle,int ipar[], double dpar[], int *stat);

void s_commit_trig_transform (float f[], DFTI_DESCRIPTOR_HANDLE *handle, intipar[], float spar[], int *stat);

Input Parameters

double for d_commit_trig_transform,float fors_commit_trig_transform

f

array of size n+1, where n is the size of the problem.Contains data vector to be transformed.


ipar

double array of size 3n/2+1. Contains double-precisiondata needed for Trigonometric Transform computations.Most of the array elements are to be initialized by theroutine.

dpar

2563


float array of size 3n/2+1. Contains single-precision dataneeded for Trigonometric Transform computations. Most ofthe array elements are to be initialized by the routine.

spar

Output Parameters

DFTI_DESCRIPTOR_HANDLE*. The data structure used byIntel MKL DFT interface (for details, refer to section “DFTFunctions” in chapter “Fast Fourier Transforms”).

handle

Contains integer data needed for Trigonometric Transformcomputations. On output, ipar[6] is updated with the statvalue.

ipar

Contains double-precision data needed for TrigonometricTransform computations. On output, the entire array isinitialized.

dpar

Contains single-precision data needed for TrigonometricTransform computations. On output, the entire array isinitialized.

spar

int*. Contains the routine completion status, which is alsowritten to ipar[6].

stat

Description

The routine ?_commit_trig_transform checks consistency and correctness of the parametersto be passed to the transform routines ?_forward_trig_transform and/or?_backward_trig_transform. The routine also initializes the following data structures:handle, dpar in case of d_commit_trig_transform, and spar in case ofs_commit_trig_transform. The ?_commit_trig_transform routine initializes only thoseelements of dpar or spar that depend upon the type of transform, defined in the?_init_trig_transform routine and passed to ?_commit_trig_transform with the ipararray. The size of the problem n, which determines sizes of the array parameters, is also passedto the routine with the ipar array and defined in the previously called ?_init_trig_transformroutine. For detailed description of arrays ipar, dpar and spar, refer to the Common Parameterssection. The routine performs only a basic check for correctness and consistency of theparameters. If you are going to modify parameters of TT routines, see the Caveat on ParameterModifications section. Unlike ?_init_trig_transform, you cannot skip calling this routine inyour code.

2564


Return Values

The routine produced some warnings and madesome changes in the parameters to achieve theircorrectness and/or consistency. You may proceedwith computations by assigning ipar[6]=0 if youare sure that the parameters are correct.

stat= 11

The routine made some changes in the parametersto achieve their correctness and/or consistency. Youmay proceed with computations by assigningipar[6]=0 if you are sure that the parameters arecorrect.

stat= 10

The routine produced some warnings. You mayproceed with computations by assigning ipar[6]=0if you are sure that the parameters are correct.

stat= 1

The routine completed the task normally.stat= 0

The routine stopped for any of the following reasons:stat= -100

• An error in the user's data was encountered.

• Data in ipar, dpar or spar parameters becameincorrect and/or inconsistent as a result ofmodifications.

The routine stopped because of DFT interface error.stat= -1000

The routine stopped as the initialization failed tocomplete or parameter ipar[0] was altered bymistake.

stat= -10000

NOTE. Although positive values of stat usually indicate minor problems with the inputdata and Trigonometric Transform computations can be continued, it is highlyrecommended to investigate the problem first and achieve stat=0.

2565


?_forward_trig_transformComputes the forward Trigonometric Transform oftype specified by a parameter.

Syntax

void d_forward_trig_transform (double f[], DFTI_DESCRIPTOR_HANDLE *handle,int ipar[], double dpar[], int *stat);

void s_forward_trig_transform (float f[], DFTI_DESCRIPTOR_HANDLE *handle,int ipar[], float spar[], int *stat);

Input Parameters

double for d_forward_trig_transform,ffloat for s_forward_trig_transformarray of size n+1, where n is the size of the problem.At input, contains data vector to be transformed.


handle


ipar


dpar


spar

Output Parameters

Contains the transformed vector on output.f


ipar


stat

2566


Description

The routine computes the forward Trigonometric Transform of type defined in the?_init_trig_transform routine and passed to ?_forward_trig_transform with the ipararray. The size of the problem n, which determines sizes of the array parameters, is also passedto the routine with the ipar array and defined in the previously called ?_init_trig_transformroutine. Other data that facilitates the computation is created by ?_commit_trig_transformand supplied in dpar or spar. For detailed description of arrays ipar, dpar and spar, referto the Common Parameters section. The routine has a commit step, which calls the?_commit_trig_transform routine. The transform is computed according to formulas givenin the Transforms Implemented section. The routine replaces the input vector f with thetransformed vector.

NOTE. If you need a copy of the data vector f to be transformed, you should make thecopy before calling the ?_forward_trig_transform routine.

Return Values






The routine stopped as its commit step failed tocomplete or the parameter ipar[0] was altered bymistake.

stat= -10000

2567


?_backward_trig_transformComputes the backward Trigonometric Transformof type specified by a parameter.

Syntax

void d_backward_trig_transform (double f[], DFTI_DESCRIPTOR_HANDLE *handle,int ipar[], double dpar[], int *stat);

void s_backward_trig_transform (float f[], DFTI_DESCRIPTOR_HANDLE *handle,int ipar[], float spar[], int *stat);

Input Parameters

double for d_backward_trig_transform,ffloat for s_backward_trig_transformarray of size n+1, where n is the size of the problem. Atinput, contains data vector to be transformed.


handle


ipar


dpar


spar

Output Parameters

Contains the transformed vector on output.f


ipar


stat

2568


Description

The routine computes the backward Trigonometric Transform of type defined in the?_init_trig_transform routine and passed to ?_backward_trig_transform with the ipararray. The size of the problem n, which determines sizes of the array parameters, is also passedto the routine with the ipar array and defined in the previously called ?_init_trig_transformroutine. Other data that facilitates the computation is created by ?_commit_trig_transformand supplied in dpar or spar. For detailed description of arrays ipar, dpar and spar, referto the Common Parameters section. The routine has a commit step, which calls the?_commit_trig_transform routine. The transform is computed according to formulas givenin the Transforms Implemented section. The routine replaces the input vector f with thetransformed vector.

NOTE. If you need a copy of the data vector f to be transformed, you should make thecopy before calling the ?_backward_trig_transform routine.

Return Values






The routine stopped as its commit step failed tocomplete or the parameter ipar[0] was altered bymistake.

stat= -10000

2569


free_trig_transformCleans the memory allocated for the data structureused by DFT interface.

Syntax

void free_trig_transform (DFTI_DESCRIPTOR_HANDLE *handle, int ipar[], int*stat);

Input Parameters


ipar


handle

Output Parameters

The data structure used by Intel MKL DFT interface. Memoryallocated for the structure is released on output.

handle


ipar


stat

Description

The routine cleans the memory used by the handle structure, needed for Intel MKL DFTfunctions. If you need to release memory allocated for other parameters, you should includethe memory cleaning in your code.

Return Values



The routine failed to complete the task.stat= -99999

2570


Common Parameters

This section provides description of array parameters that hold TT routine options: ipar, dparand spar.

NOTE. Initial values are assigned to the array parameters by the appropriate?_init_trig_transform and ?_commit_trig_transform routines.

int array of size 128, holds integer data needed for TrigonometricTransform computations. Its elements are described in Table 13-2:

ipar

Table 13-2 Elements of the ipar Array

DescriptionIndex

Contains the size of the problem to solve. The ?_init_trig_transformroutine sets ipar[0]=n and all subsequently called TT routines use ipar[0]as the size of the transform. Current implementation of TT interface supportstransforms of even size only.

0

Contains error messaging options:1

• ipar[1]=-1 indicates that all error messages will be printed to the fileMKL_Trig_Transforms_log.txt in the folder from which the routine iscalled. If the file does not exist, the routine tries to create it. If theattempt fails, the routine prints information that the file cannot be createdto the standard output device.

• ipar[1]=0 indicates that no error messages will be printed.• ipar[1]=1 is the default value. It indicates that all error messages will

be printed to the preconnected default output device (usually, screen).

In case of errors, any TT routine will assign a non-zero value to statregardless of the ipar[1] setting.

Contains warning messaging options:2

• ipar[2]=-1 indicates that all warning messages will be printed to thefile MKL_Trig_Transforms_log.txt in the directory from which the routineis called. If the file does not exist, the routine tries to create it. If theattempt fails, the routine prints information that the file cannot be createdto the standard output device.

2571


DescriptionIndex

• ipar[2]=0 indicates that no warning messages will be printed.• ipar[2]=1 is the default value. It indicates that all warning messages

will be printed to the preconnected default output device (usually,screen).

In case of warnings, the stat parameter will acquire a non-zero valueregardless of the ipar[2] setting.

Reserved for future use.3 through 4

Contains the type of the transform. The ?_init_trig_transform routinesets ipar[5]=tt_type and all subsequently called TT routines use ipar[5]as the type of the transform.

5

Contains the stat value returned by the last completed TT routine. Usedto check that the previous call to a TT routine completed with stat=0.

6

Informs the ?_commit_trig_transform routines whether to initialize datastructures dpar (spar) and handle. ipar[7]=0 indicates that the routineshould skip the initialization and only check correctness and consistency of

7

the parameters. Otherwise, the routine initializes the data structures. Thedefault value is 1. The possibility to check correctness and consistency ofinput data without initializing data structures dpar, spar and handleprevents from losing performance in a repeated use of the same transformfor different data vectors.Note that you can benefit from the opportunitythat ipar[7] gives only if you are sure to have supplied proper tolerancevalue in the dpar or spar array. Otherwise, avoid tuning this parameter.

Contains message style options for TT routines. If ipar[8]=0 then TTroutines print all error and warning messages in Fortran-style notations.Otherwise, TT routines print the messages in C-style notations. The default

8

value is 1. When selecting between these notations, you should mind thatby default, numbering of elements in C arrays starts from 0 and in Fortran,it starts from 1. For example, if a part of a C-style message looks like“parameter ipar[0]=3 should be an even integer”, then the correspondingFortran-style message will be “parameter ipar(1)=3 should be an eveninteger”. ipar[8] enables users to view messages in a more convenientstyle.

2572


DescriptionIndex

Specifies the number of OpenMP threads to run TT routines in the OpenMPenvironment of the Poisson Library. The default value is 1. It is highlyrecommended not to alter this value. See also Caveat on ParameterModifications.

9

Reserved for future use.10 through 127

NOTE. You may declare the ipar array in your code as int ipar[10]. However, forcompatibility with later versions of Intel MKL TT interface, which may require more iparvalues, it is highly recommended to declare ipar as int ipar[128].

Arrays dpar and spar are similar to each other and differ only in the data precision:

double array of size 3n/2+1, holds data needed for double-precisionroutines to perform TT computations. This array is initialized in thed_init_trig_transform and d_commit_trig_transform routines.

dpar

float array of size 3n/2+1, holds data needed for single-precisionroutines to perform TT computations. This array is initialized in thes_init_trig_transform and s_commit_trig_transform routines.

spar

As dpar and spar have similar elements in respective positions, the elements are describedtogether in Table 13-3:

Table 13-3 Elements of the dpar and spar Arrays

DescriptionIndex

The element contains the first absolute tolerance used by the appropriate?_commit_trig_transform routine. For a staggered cosine or a sinetransform, f[n] should be equal to 0.0 and for a sine transform, f[0]

0

should be equal to 0.0. The ?_commit_trig_transform routine checks ifabsolute values of these parameters are below dpar[0]*n or spar[0]*n,depending on the routine precision. You can suppress warnings resultingfrom tolerance checks by setting dpar[0] or spar[0] to a sufficiently largenumber.

The element is reserved for future use.1

2573


DescriptionIndex

The elements contain tabulated values of trigonometric functions. Contentsof the elements depend upon the type of transform tt_type, set up in the?_commit_trig_transform routine:

2 through 3n/2

• If tt_type=MKL_SINE_TRANSFORM, then the array contains n/2 elementswith tabulated sine values in n/2 successive array elements starting fromthe third element (with index 2). The rest of the array is not used in thistransform.

• If tt_type=MKL_COSINE_TRANSFORM, then the array contains n elementswith tabulated cosine values in n successive array elements starting fromthe third element (with index 2). The rest of the array is not used in thistransform.

• If tt_type=MKL_STAGGERED_COSINE_TRANSFORM, then the arraycontains 3n/2-2 elements with tabulated sine and cosine values in 3n/2-2successive array elements starting from the third element (with index2). The rest of the array is not used in this transform.

NOTE. You may define the array size depending upon the type of transform.

Caveat on Parameter Modifications

Flexibility of TT interface makes it possible to skip calling the ?_init_trig_transform routineand initialize the basic data structures explicitly in your code. You may also need to modifycontents of ipar, dpar and spar arrays after initialization. When doing so, you should providecorrect and consistent data in the arrays. Mistakenly altered arrays cause errors or wrongcomputation. You can perform basic check for correctness and consistency of parameters bycalling the ?_commit_trig_transform routine but it does not ensure the correct result of atransform, it only reduces the chance of errors or wrong result.

NOTE. To supply correct and consistent parameters to TT routines, you should haveconsiderable experience in using TT interface and good understanding of elements thatthe ipar, spar and dpar arrays contain and dependencies between values of theseelements.

2574


However, in rare occurrences, even advanced users may fail to compute a transform using TTroutines after the parameter modifications.

WARNING. The only way that ensures proper computation of the TrigonometricTransforms is to follow a typical sequence of invoking the routines and not change thedefault set of parameters. So, avoid modifications of ipar, dpar and spar arrays unlessa strong need arises.


Several aspects of the Intel MKL TT interface are platform-specific and language-specific. Topromote portability across platforms and ease of use across different languages, users areprovided with Intel MKL TT language-specific header files to include in their code. Currently,the following of them are available:

• mkl_trig_transforms.h, to be used together with mkl_dfti.h, for C programs.

• mkl_trig_transforms.f90, to be used toghether with mkl_dfti.f90, for Fortran-90 programs.

NOTE. Use of the Intel MKL TT software without including one of the above header filesis not supported.

C-specific Header File

The C-specific header file defines the following function prototypes:

void d_init_trig_transform(int *, int *, int *, double *, int *);

void d_commit_trig_transform(double *, DFTI_DESCRIPTOR_HANDLE *, int *,double *, int *);

void d_forward_trig_transform(double *, DFTI_DESCRIPTOR_HANDLE *, int *, double *, int *);

void d_backward_trig_transform(double *, DFTI_DESCRIPTOR_HANDLE *, int *,double *, int *);

void s_init_trig_transform(int *, int *, int *, float *, int *);

void s_commit_trig_transform(float *, DFTI_DESCRIPTOR_HANDLE *, int *, float*, int *);

2575


void s_forward_trig_transform(float *, DFTI_DESCRIPTOR_HANDLE *, int *, float*, int *);

void s_backward_trig_transform(float *, DFTI_DESCRIPTOR_HANDLE *, int *,float *, int *);

void free_trig_transform(DFTI_DESCRIPTOR_HANDLE *, int *, int *);

2576


Fortran-Specific Header File

The Fortran-90-specific header file defines the following function prototypes:

SUBROUTINE D_INIT_TRIG_TRANSFORM(n, tt_type, ipar, dpar, stat)

INTEGER, INTENT(IN) :: n, tt_type

INTEGER, INTENT(INOUT) :: ipar(*)

REAL(8), INTENT(INOUT) :: dpar(*)

INTEGER, INTENT(OUT) :: stat

END SUBROUTINE D_INIT_TRIG_TRANSFORM

SUBROUTINE D_COMMIT_TRIG_TRANSFORM(f, handle, ipar, dpar, stat)

REAL(8), INTENT(INOUT) :: f(*)

TYPE(DFTI_DESCRIPTOR), POINTER :: handle




END SUBROUTINE D_COMMIT_TRIG_TRANSFORM

SUBROUTINE D_FORWARD_TRIG_TRANSFORM(f, handle, ipar, dpar, stat)






END SUBROUTINE D_FORWARD_TRIG_TRANSFORM

SUBROUTINE D_BACKWARD_TRIG_TRANSFORM(f, handle, ipar, dpar, stat)



2577





END SUBROUTINE D_BACKWARD_TRIG_TRANSFORM

SUBROUTINE S_INIT_TRIG_TRANSFORM(n, tt_type, ipar, spar, stat)

INTEGER, INTENT(IN) :: n, tt_type


REAL(4), INTENT(INOUT) :: spar(*)


END SUBROUTINE S_INIT_TRIG_TRANSFORM

SUBROUTINE S_COMMIT_TRIG_TRANSFORM(f, handle, ipar, spar, stat)






END SUBROUTINE S_COMMIT_TRIG_TRANSFORM

SUBROUTINE S_FORWARD_TRIG_TRANSFORM(f, handle, ipar, spar, stat)






END SUBROUTINE S_FORWARD_TRIG_TRANSFORM

2578


SUBROUTINE S_BACKWARD_TRIG_TRANSFORM(f, handle, ipar, spar, stat)






END SUBROUTINE S_BACKWARD_TRIG_TRANSFORM

SUBROUTINE FREE_TRIG_TRANSFORM(handle, ipar, stat)




END SUBROUTINE FREE_TRIG_TRANSFORM

Fortran-90 specifics of the TT routines usage are similar for all Intel MKL PDE support tools anddescribed in the Calling PDE Support Routines from Fortran-90 section.

Poisson Library RoutinesIn addition to Real Discrete Trigonometric Transforms (TT) interface (refer to TrigonometricTransform Routines), Intel® MKL supports the Poisson Library interface referred to as PL interface.The interface implements a group of routines (PL routines) used to compute a solution ofLaplace, Poisson, and Helmholtz problems of special kind using discrete Fourier transforms.Laplace and Poisson problems are special cases of a more general Helmholtz problem. Theproblems being solved are defined more exactly in the Poisson Library Implemented subsection.PL interface provides much flexibility of use: you can adjust routines to your particular needsat the cost of manual tuning routine parameters or just call routines with default parametervalues. The interface can adjust style of error and warning messages to C or Fortran notationsby setting up a dedicated parameter. This adds convenience to debugging, as users can readinformation in the way that is natural for their code. Intel MKL PL interface currently containsonly routines that implement the following solvers:

• Fast Laplace, Poisson and Helmholtz solvers in a Cartesian coordinate system

• Fast Poisson and Helmholtz solvers in a spherical coordinate system.

To describe Intel MKL PL interface, C convention will be used. Fortran usage specifics can befound in the Calling PDE Support Routines from Fortran-90 section.

2579


NOTE. Fortran users should mind that respective array indices in Fortran increase by1.

Poisson Library Implemented

PL routines enable approximate solving of certain two-dimensional and three-dimensionalproblems. Figure 13-2 shows general structure of the Poisson Library.

Figure 13-2 Structure of the Poisson Library

Sections below provide details of the problems that can be solved using Intel MKL PL.

Two-Dimensional Problems


2580


PL interface description uses the following notation for boundaries of a rectangular domain ax< x < bx, ay < y < by on a Cartesian plane:

bd_ax = {x = ax, ay ≤ y ≤ by}, bd_bx = {x = bx, ay ≤ y ≤ by}

bd_ay = {ax ≤ x ≤ bx, y = ay}, bd_by = {ax ≤ x ≤ bx, y = by}.

The wildcard "+" may stand for any of the symbols ax, bx, ay, by, so that bd_+ denotes any ofthe above boundaries.

PL interface description uses the following notation for boundaries of a rectangular domain aφ

< φ < bφ, aθ < θ < bθ on a sphere 0 ≤ φ ≤ 2 π, 0 ≤ θ ≤ π:

bd_aφ = {φ = aφ, aθ ≤ θ ≤ bθ}, bd_bφ = {φ = bφ, aθ ≤ θ ≤ bθ}

bd_aθ = {aφ ≤ φ ≤ bφ, θ = aθ}, bd_bθ = {aφ ≤ φ ≤ bφ, θ = bθ}.

The wildcard "~" may stand for any of the symbols aφ, bφ, aθ, bθ, so that bd_~ denotes any ofthe above boundaries.

Two-dimensional (2D) Helmholtz problem on a Cartesian plane

2D Helmholtz problem is to find an approximate solution of Helmholtz equation

in a rectangle, that is, a rectangular domain ax< x < bx, ay< y < by, with one of the followingboundary conditions on each boundary bd_+:

• Dirichlet boundary condition

• Neumann boundary condition

2581

where

n= -x on bd_ax, n= x on bd_bx,

n= -y on bd_ay, n= y on bd_by.

Two-dimensional (2D) Poisson problem on a Cartesian plane

The Poisson problem is a special case of the Helmholtz problem, when q=0. 2D Poisson problemis to find an approximate solution of Poisson equation

in a rectangle ax< x < bx, ay< y < by with Dirichlet or Neumann boundary condition on eachboundary bd_+. In case of a problem with Neumann boundary condition on the entire boundary,you can find the solution of the problem up to a constant only. In this case, Poisson Library willcompute the solution that provides the minimal Euclidean norm of a residual.

Two-dimensional (2D) Laplace problem on a Cartesian plane

The Laplace problem is a special case of the Helmholtz problem, when q=0 and f(x, y)=0.2D Laplace problem is to find an approximate solution of Laplace equation

in a rectangle ax< x < bx, ay< y < by with Dirichlet or Neumann boundary condition on eachboundary bd_+.

Helmholtz problem on a sphere

Helmholtz problem on a sphere is to find an approximate solution of Helmholtz equation

2582

in a spherical rectangle that is, a domain bounded by angles aφ≤ φ ≤ bφ, aθ≤ θ ≤ bθ, with boundaryconditions for particular domains listed in Table 13-4.

Table 13-4 Details of Helmholtz Problem on a Sphere

Periodic/non-periodiccase

Boundary conditionDomain on a sphere

non-periodicHomogeneous Dirichlet boundaryconditions on each boundary bd_~

Rectangular, that is, bφ - aφ < 2 π and

bθ - aθ < π

periodicHomogeneous Dirichlet boundaryconditions on the boundaries bd_aθ

and bd_bθ

Where aφ = 0, bφ = 2 π, and bθ - aθ

< π

periodicBoundary conditionEntire sphere, that is, aφ = 0, bφ = 2

π, aθ = 0, and bθ = π

at the poles.

Poisson problem on a sphere

The Poisson problem is a special case of the Helmholtz problem, when q=0. Poisson problemon a sphere is to find an approximate solution of Poisson equation

2583

in a spherical rectangle aφ≤ φ ≤ bφ, aθ≤ θ ≤ bθ in cases listed in Table 13-4. The solution to thePoisson problem on the entire sphere can be found up to a constant only. In this case, PoissonLibrary will compute the solution that provides the minimal Euclidean norm of a residual.

Approximation of 2D problems

To find an approximate solution for any of the 2D problems, a uniform mesh is built in therectangular domain:

in the Cartesian case and

in the spherical case.

Poisson Library uses standard five-point finite difference approximation on this mesh to computethe approximation to the solution. In the Cartesian case, the values of the approximate solutionwill be computed in the mesh points (xi , yj) provided that the user knows the values of theright-hand side f(x, y) in these points and the values of the appropriate boundary functionsG(x, y) and/or g(x,y) in the mesh points laying on the boundary of the rectangular domain.In the spherical case, the values of the approximate solution will be computed in the mesh

points (φi , θj) provided that the user knows the values of the right-hand side f(φ, θ) in thesepoints.

2584


NOTE. The number of mesh intervals nx in the x direction of a Cartesian mesh must be

even. The number of mesh intervals nφ in the φ direction of a spherical mesh must beeven in the non-periodic case and divisible by four in the periodic case. Currentimplementation of the Poisson Library does not support meshes with the number ofintervals that does not meet the above conditions.

Three-Dimensional Problems


PL interface description uses the following notation for boundaries of a parallelepiped domainax < x < bx, ay < y <by, az < z <bz:

bd_ax = {x = ax, ay ≤ y ≤ by, az ≤ z ≤ bz}, bd_bx = {x = bx, ay ≤ y ≤ by, az ≤ z ≤ bz}

bd_ay = {ax ≤ x ≤ bx, y = ay, az ≤ z ≤ bz}, bd_by = {ax ≤ x ≤ bx, y = by, az ≤ z ≤ bz}

bd_az = {ax ≤ x ≤ bx, ay ≤ y ≤ by, z = az}, bd_bx = {ax ≤ x ≤ bx, ay ≤ y ≤ by, z = bz}.

The wildcard "+" may stand for any of the symbols ax, bx, ay, by, az, bz, so that bd_+ denotesany of the above boundaries.

Three-dimensional (3D) Helmholtz problem

3D Helmholtz problem is to find an approximate solution of Helmholtz equation

in a parallelepiped, that is, a parallelepiped domain ax< x < bx, ay< y < by, az< z < bz, withone of the following boundary conditions on each boundary bd_+:



2585

where

n= -x on bd_ax, n= x on bd_bx,

n= -y on bd_ay, n= y on bd_by,

n= -z on bd_az, n= z on bd_bz.

Three-dimensional (3D) Poisson problem

The Poisson problem is a special case of the Helmholtz problem, when q=0. 3D Poisson problemis to find an approximate solution of Poisson equation

in a parallelepiped ax< x < bx , ay< y < by, az< z < bz with Dirichlet or Neumann boundarycondition on each boundary bd_+.

Three-dimensional (3D) Laplace problem

The Laplace problem is a special case of the Helmholtz problem, when q=0 and f(x, y,z)=0.3D Laplace problem is to find an approximate solution of Laplace equation

in a parallelepiped ax< x < bx , ay< y < by, az< z < bz with Dirichlet or Neumann boundarycondition on each boundary bd_+.

Approximation of 3D problems

To find an approximate solution for any of the 3D problems, a uniform mesh is built in theparallelepiped domain

2586

where

Poisson Library uses standard seven-point finite difference approximation on this mesh tocompute the approximation to the solution. The values of the approximate solution will becomputed in the mesh points (xi , yj , zk) provided that the user knows the values of theright-hand side f(x, y, z) in these points and the values of the appropriate boundary functionsG(x, y, z) and/or g(x, y, z) in the mesh points laying on the boundary of the parallelepipeddomain.

NOTE. The number of mesh intervals nx and ny in the x and y direction, respectively,must be even. Current implementation of Poisson Library does not support meshes withodd number of intervals.

Sequence of Invoking PL Routines

NOTE. Further description will always consider the solution process for Helmholtzproblem, as Fast Poisson Solver and Fast Laplace Solver in Cartesian coordinates arespecial cases of Fast Helmholtz Solver (see Poisson Library Implemented).

2587


Computation of a solution of Helmholtz problem using PL interface is conceptually divided intofour steps each of which is performed via a dedicated routine. Table 13-5 lists names of theroutines and briefly describes their purpose.

Most of PL routines have versions operating with single-precision and double-precision data.Names of such routines begin respectively with “s” and “d”. The wildcard “?” stands for eitherof these symbols in routine names. The routines for Cartesian coordinate system have 2D and3D versions. Their names end respectively in “2D” and “3D”. The routines for spherical coordinatesystem have periodic and non-periodic versions. Their names end respectively in “p” and “np”

Table 13-5 PL Interface Routines

DescriptionRoutine

Initializes basic data structures ofPoisson Library for Fast HelmholtzSolver in 2D/3D/periodic/non-periodiccase, respectively.

?_init_Helmholtz_2D/?_init_Helmholtz_3D/?_init_sph_p/?_init_sph_np

Checks consistency and correctness ofuser's data, creates and initializes datastructures to be used by Intel MKL DFTinterface1 as well as other datastructures needed for the solver.

?_commit_Helmholtz_2D/?_commit_Helmholtz_3D/?_commit_sph_p/?_commit_sph_np

Computes an approximate solution of2D/3D/periodic/non-periodic Helmholtzproblem (see Poisson LibraryImplemented) specified by parameters.

?_Helmholtz_2D/?_Helmholtz_3D/?_sph_p/?_sph_np

Cleans the memory used by the datastructures needed for calling Intel MKLDFT interface1.

free_Helmholtz_2D/free_Helmholtz_3D/free_sph_p/free_sph_np

1 PL routines call Intel MKL DFT interface for better performance.

To find once an approximate solution of Helmholtz problem, the Intel MKL PL interface routinesare normally invoked in the order in which they are listed in Table 13-5.

NOTE. Though the order of invoking PL routines may be changed, it is highlyrecommended to follow the above order of routine calls.

2588


The diagram in Figure 13-3 indicates the typical order in which PL routines can be invoked ina general case.

Figure 13-3 Typical Order of Invoking PL Routines

2589


A general scheme of using PL routines for double-precision computations in a 3D Cartesiancase is shown below. Similar scheme holds for single-precision computations with the onlydifference in the initial letter of routine names. The general scheme in a 2D Cartesian casediffers form the one below in the set of routine parameters and the ending of routine names:“2D” instead of “3D”....

d_init_Helmholtz_3D(&ax, &bx, &ay, &by, &az, &bz, &nx, &ny, &nz, BCtype,ipar, dpar, &stat);

/* change parameters in ipar and/or dpar if necessary. */

/* note that the result of the Fast Helmholtz Solver will be in f! If youwant to keep the data stored in f,

save it before the function call below */

d_commit_Helmholtz_3D(f, bd_ax, bd_bx, bd_ay, bd_by, bd_az, bd_bz, &xhandle,&yhandle, ipar, dpar, &stat);

d_Helmholtz_3D(f, bd_ax, bd_bx, bd_ay, bd_by, bd_az, bd_bz, &xhandle,&yhandle, ipar, dpar, &stat);

free_Helmholtz_3D (&xhandle, &yhandle, ipar, &stat);

/* here you may clean the memory used by f, dpar, ipar */

...

2590


A general scheme of using PL routines for double-precision computations in a spherical periodiccase is shown below. Similar scheme holds for single-precision computations with the onlydifference in the initial letter of routine names. The general scheme in a spherical non-periodiccase differs from the one below in the set of routine parameters and the ending of routinenames: “np” instead of “p”....

d_init_sph_p(&ap,&bp,&at,&bt,&np,&nt,&q,ipar,dpar,&stat);

/* change parameters in ipar and/or dpar if necessary. */

/* note that the result of the Fast Helmholtz Solver will be in f! If youwant to

keep the data stored in f, save it before the function call below */

d_commit_sph_p(f,&handle_s,&handle_c,ipar,dpar,&stat);

d_sph_p(f,&handle_s,&handle_c,ipar,dpar,&stat);

free_sph_p(&handle_s,&handle_c,ipar,&stat);

/* here you may clean the memory used by f, dpar, ipar */

...

You can find examples of Fortran-90 and C code that use PL routines to solve Helmholtz problem(in both Cartesian and spherical cases) in the “Poisson Library Code Examples” section inAppendix C.

Interface Description

All types in this documentation are standard C types: INT, FLOAT, and DOUBLE. Fortran-90users can call the routines with INTEGER, REAL, and DOUBLE PRECISION Fortran types,respectively (see examples in the “Poisson Library Code Examples” section in Appendix C).

Routine Options

All PL routines have parameters that are used for passing various options to the routines. Theseparameters are arrays ipar, dpar and spar. Values for these parameters should be specifiedvery carefully (see Common Parameters). You can change these values during computationsto meet your needs.

NOTE. You must provide correct and consistent parameters to the routines to avoidfailure or wrong results.

2591


User Data Arrays

PL routines take arrays of user data as input. For example, user arrays are passed to the routined_Helmholtz_3D to compute an approximate solution to 3D Helmholtz problem. To minimizestorage requirements and improve the overall run-time efficiency, Intel MKL PL routines do notmake copies of user input arrays.

NOTE. If you need a copy of your input data arrays, you should save them yourself.

PL Routines for the Cartesian Solver

The section gives detailed description of Cartesian PL routines, their syntax, parameters andvalues they return. All flavors of the same routine, namely, double-precision and single-precision,2D and 3D, are described together.

NOTE. Some of the routine parameters are used only in the 3D Fast Helmholtz Solver.

PL routines call Intel MKL DFT interface (described in section “DFT Functions” in chapter “FastFourier Transforms”), which enhances performance of the routines.

?_init_Helmholtz_2D/?_init_Helmholtz_3DInitializes basic data structures of the Fast 2D/3DHelmholtz Solver.

Syntax

void d_init_Helmholtz_2D(double* ax, double* bx, double* ay, double* by,int* nx, int* ny, char* BCtype, double* q, int* ipar, double* dpar, int*stat);

void s_init_Helmholtz_2D(float* ax, float* bx, float* ay, float* by, int*nx, int* ny, char* BCtype, float* q, int* ipar, float* spar, int* stat);

void d_init_Helmholtz_3D(double* ax, double* bx, double* ay, double* by,double* az, double* bz, int* nx, int* ny, int* nz, char* BCtype, double* q,int* ipar, double* dpar, int* stat);

2592


void s_init_Helmholtz_3D(float* ax, float* bx, float* ay, float* by, float*az, float* bz, int* nx, int* ny, int* nz, char* BCtype, float* q, int* ipar,float* spar, int* stat);

Input Parameters

double* ford_init_Helmholtz_2D/d_init_Helmholtz_3D,

ax

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the leftmost boundary of the domain alongx-axis.


bx

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the rightmost boundary of the domainalong x-axis.


ay

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the leftmost boundary of the domain alongy-axis.


by

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the rightmost boundary of the domainalong y-axis.


az

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the leftmost boundary of the domain alongz-axis. This parameter is needed only for the?_init_Helmholtz_3D routine.


bz

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The coordinate of the rightmost boundary of the domainalong z-axis. This parameter is needed only for the?_init_Helmholtz_3D routine.

2593


int*. The number of mesh intervals along x-axis. Must beeven.

nx

int*. The number of mesh intervals along y-axis. Must beeven in the 3D case.

ny

int*. The number of mesh intervals along z-axis. Thisparameter is needed only for the ?_init_Helmholtz_3Droutine.

nz

char*. Contains the type of boundary conditions on eachboundary. Must contain four characters for?_init_Helmholtz_2D and six characters for

BCtype

?_init_Helmholtz_3D. Each of the characters can be 'N'(Neumann boundary condition) or 'D' (Dirichlet boundarycondition). Types of boundary conditions for the boundariesshould be specified in the following order:bd_ax, bd_bx,bd_ay, bd_by, bd_az, bd_bz. Boundary condition types forthe last two boundaries should be specified only in the 3Dcase.


q

float* for s_init_Helmholtz_2D/s_init_Helmholtz_3D.The constant Helmholtz coefficient. Note that to solvePoisson or Laplace problem, you should set the value of qto 0.

Output Parameters

int array of size 128. Contains integer data to be used byFast Helmholtz Solver (for details, refer to CommonParameters).

ipar

double array of size 5*nx/2+7 in the 2D case or5*(nx+ny)/2+9 in the 3D case. Contains double-precisiondata to be used by Fast Helmholtz Solver (for details, referto Common Parameters).

dpar

float array of size 5*nx/2+7 in the 2D case or5*(nx+ny)/2+9 in the 3D case. Contains single-precisiondata to be used by Fast Helmholtz Solver (for details, referto Common Parameters).

spar

2594


int*. Routine completion status, which is also written toipar[0]. The status should be 0 to proceed to other PLroutines.

stat

Description

The routines ?_init_Helmholtz_2D/?_init_Helmholtz_3D are called to initialize basic datastructures for Poisson Library computations of the appropriate precision. All routines invokedafter a call to a ?_init_Helmholtz_2D/?_init_Helmholtz_3D routine use values of theipar, dpar and spar array parameters returned by the routine. Detailed description of thearray parameters can be found in Common Parameters.

WARNING. Data structures initialized and created by 2D/3D flavors of the routinecannot be used by 3D/2D flavors of any PL routines, respectively.

You can skip calling this routine in your code. However, see Caveat on Parameter Modificationsbefore doing so.

Return Values

The routine successfully completed the task. Ingeneral, to proceed with computations, the routineshould complete with this stat value.

stat= 0

The routine failed to complete the task because offatal error.

stat= -99999

?_commit_Helmholtz_2D/?_commit_Helmholtz_3DChecks consistency and correctness of user's dataas well as initializes certain data structures requiredto solve 2D/3D Helmholtz problem.

Syntax

void d_commit_Helmholtz_2D(double* f, double* bd_ax, double* bd_bx, double*bd_ay, double* bd_by, DFTI_DESCIPTOR* xhandle, int* ipar, double* dpar, int*stat);

2595


void s_commit_Helmholtz_2D (float* f, float* bd_ax, float* bd_bx, float*bd_ay, float* bd_by, DFTI_DESCIPTOR* xhandle, int* ipar, float* spar, int*stat);

void d_commit_Helmholtz_3D(double* f, double* bd_ax, double* bd_bx, double*bd_ay, double* bd_by, double* bd_az, double* bd_bz, DFTI_DESCIPTOR* xhandle,DFTI_DESCIPTOR* yhandle, int* ipar, double* dpar, int* stat);

void s_commit_Helmholtz_3D(float* f, float* bd_ax, float* bd_bx, float*bd_ay, float* bd_by, float* bd_az, float* bd_bz, DFTI_DESCIPTOR* xhandle,DFTI_DESCIPTOR* yhandle, int* ipar, float* spar, int* stat);

Input Parameters

double* ford_commit_Helmholtz_2D/d_commit_Helmholtz_3D,

f

float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.Contains the right-hand side of the problem packed in asingle vector. The size of the vector in the 2D case is(nx+1)*(ny+1). In this case, value of the right-hand sidein the mesh point (i, j) is stored in f[i+j*(nx+1)] . Thesize of the vector in the 3D case is (nx+1)*(ny+1)*(nz+1).In this case, value of the right-hand side in the mesh point(i, j, k) is stored in f[i+j*(nx+1)+k*(nx+1)*(ny+1)].Note that to solve Laplace problem, you should set all theelements of the array f to 0.Note that the array f may be altered by the routine. Pleasesave this vector in another memory location if you want topreserve it.


ipar


dpar

2596



spar


bd_ax

float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along x-axis.For ?_commit_Helmholtz_2D, the size of the array is ny+1.In case of Dirichlet boundary condition (value of BCtype[0]is 'D'), it contains values of the function G(ax, yj), j=0, ...,ny. In case of Neumann boundary condition (value ofBCtype[0] is 'N'), it contains values of the function g(ax,yj), j=0, ..., ny. The value corresponding to the index j isplaced in bd_ax[j].For ?_commit_Helmholtz_3D, the size of the array is(ny+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[0] is 'D'), it contains values of the functionG(ax, yj, zk), j=0, ..., ny, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[0] is 'N'), it containsthe values of the function g(ax, yj, zk), j=0, ..., ny, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (j, k) is placed inbd_ax[j+k*(ny+1)].


bd_bx

float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along x-axis.For ?_commit_Helmholtz_2D, the size of the array is ny+1.In case of Dirichlet boundary condition (value of BCtype[1]is 'D'), it contains values of the function G(bx, yj), j=0, ...,ny. In case of Neumann boundary condition (value of

2597


BCtype[1] is 'N'), it contains values of the function g(bx,yj), j=0, ..., ny. The value corresponding to the index j isplaced in bd_bx[j].For ?_commit_Helmholtz_3D, the size of the array is(ny+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[1] is 'D'), it contains values of the functionG(bx, yj, zk), j=0, ..., ny, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[1] is 'N'), it containsthe values of the function g(bx, yj, zk), j=0, ..., ny, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (j, k) is placed inbd_bx[j+k*(ny+1)].


bd_ay

float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along y-axis.For ?_commit_Helmholtz_2D, the size of the array is nx+1.In case of Dirichlet boundary condition (value of BCtype[2]is 'D'), it contains values of the function G(xi, ay), i=0, ...,nx. In case of Neumann boundary condition (value ofBCtype[2] is 'N'), it contains values of the function g(xi,ay), i=0, ..., nx. The value corresponding to the index i isplaced in bd_ay[i].For ?_commit_Helmholtz_3D, the size of the array is(nx+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[2] is 'D'), it contains values of the functionG(xi,ay, zk), i=0, ..., nx, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[2] is 'N'), it containsthe values of the function g(xi,ay, zk), i=0, ..., nx, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (i, k) is placed inbd_ay[i+k*(nx+1)].


bd_by

2598


float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along y-axis.For ?_commit_Helmholtz_2D, the size of the array is nx+1.In case of Dirichlet boundary condition (value of BCtype[3]is 'D'), it contains values of the function G(xi, by), i=0, ...,nx. In case of Neumann boundary condition (value ofBCtype[3] is 'N'), it contains values of the function g(xi,by), i=0, ..., nx. The value corresponding to the index i isplaced in bd_by[i].For ?_commit_Helmholtz_3D, the size of the array is(nx+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[3] is 'D'), it contains values of the functionG(xi,by, zk), i=0, ..., nx, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[3] is 'N'), it containsthe values of the function g(xi,by, zk), i=0, ..., nx, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (i, k) is placed inbd_by[i+k*(nx+1)].


bd_az

float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.This parameter is needed only for ?_commit_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along z-axis.The size of the array is (nx+1)*(ny+1). In case of Dirichletboundary condition (value of BCtype[4] is 'D'), it containsvalues of the function G(xi, yj,az), i=0, ..., nx, j=0, ...,ny. In case of Neumann boundary condition (value ofBCtype[4] is 'N'), it contains the values of the function g(xi,yj,az), i=0, ..., nx, j=0, ..., ny. The values are packed inthe array so that the value corresponding to indices (i, j)is placed in bd_az[i+j*(nx+1)].


bd_bz

2599


float* fors_commit_Helmholtz_2D/s_commit_Helmholtz_3D.This parameter is needed only for ?_commit_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along z-axis.The size of the array is (nx+1)*(ny+1). In case of Dirichletboundary condition (value of BCtype[5] is 'D'), it containsvalues of the function G(xi, yj,bz), i=0, ..., nx, j=0, ...,ny. In case of Neumann boundary condition (value ofBCtype[5] is 'N'), it contains the values of the function g(xi,yj,bz), i=0, ..., nx, j=0, ..., ny. The values are packed inthe array so that the value corresponding to indices (i, j)is placed in bd_bz[i+j*(nx+1)].

Output Parameters

Vector of the right-hand side of the problem. Possibly,altered on output.

f

Contains integer data to be used by Fast Helmholtz Solver.Modified on output as explained in Common Parameters.

ipar

Contains double-precision data to be used by Fast HelmholtzSolver. Modified on output as explained in CommonParameters.

dpar

Contains single-precision data to be used by Fast HelmholtzSolver. Modified on output as explained in CommonParameters.

spar

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). yhandle is used onlyby ?_commit_Helmholtz_3D.

xhandle, yhandle


stat

Description

The routines ?_commit_Helmholtz_2D/?_commit_Helmholtz_3D check consistency andcorrectness of the parameters to be passed to the solver routines?_Helmholtz_2D/?_Helmholtz_3D. They also initialize data structures xhandle, yhandle as

2600


well as arrays ipar and dpar/spar, depending upon the routine precision. Refer to CommonParameters to find out which particular array elements the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routines initialize and what values arewritten there. The routines perform only a basic check for correctness and consistency. If youare going to modify parameters of PL routines, see the Caveat on Parameter Modificationssection. Unlike ?_init_Helmholtz_2D/?_init_Helmholtz_3D, you cannot skip calling theseroutines in your code. Values of ax, bx, ay, by, az, bz, nx, ny, nz, and BCtype are passed toeach of the routines with the ipar array and defined in a previous call to the appropriate?_init_Helmholtz_2D/?_init_Helmholtz_3D routine.

Return Values

The routine completed without errors and producedsome warnings.

stat= 1

The routine successfully completed the task.stat= 0

The routine stopped as an error in the user's datawas found or the data in the dpar, spar or ipararray was altered by mistake.

stat= -100

The routine stopped because of Intel MKL DFT or TTinterface error.

stat= -1000


stat= -10000


stat= -99999

?_Helmholtz_2D/?_Helmholtz_3DComputes the solution of 2D/3D Helmholtz problemspecified by the parameters.

Syntax

void d_Helmholtz_2D(double* f, double* bd_ax, double* bd_bx, double* bd_ay,double* bd_by, DFTI_DESCIPTOR* xhandle, int* ipar, double* dpar, int* stat);

void s_Helmholtz_2D (float* f, float* bd_ax, float* bd_bx, float* bd_ay,float* bd_by, DFTI_DESCIPTOR* xhandle, int* ipar, float* spar, int* stat);

2601


void d_Helmholtz_3D(double* f, double* bd_ax, double* bd_bx, double* bd_ay,double* bd_by, double* bd_az, double* bd_bz, DFTI_DESCIPTOR* xhandle,DFTI_DESCIPTOR* yhandle, int* ipar, double* dpar, int* stat);

void s_Helmholtz_3D(float* f, float* bd_ax, float* bd_bx, float* bd_ay,float* bd_by, float* bd_az, float* bd_bz, DFTI_DESCIPTOR* xhandle,DFTI_DESCIPTOR* yhandle, int* ipar, float* spar, int* stat);

Input Parameters

double* for d_Helmholtz_2D/d_Helmholtz_3D,ffloat* for s_Helmholtz_2D/s_Helmholtz_3D.Contains the right-hand side of the problem packed in asingle vector and modified by the appropriate?_commit_Helmholtz_2D/?_commit_Helmholtz_3Droutine. Note that an attempt to substitute the originalright-hand side vector at this point will result in a wrongsolution.The size of the vector in the 2D case is (nx+1)*(ny+1). Inthis case, value of the right-hand side in the mesh point (i,j) is stored in f[i+j*(nx+1)] . The size of the vector in the3D case is (nx+1)*(ny+1)*(nz+1). In this case, value ofthe right-hand side in the mesh point (i, j, k) is storedin f[i+j*(nx+1)+k*(nx+1)*(ny+1)].

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). yhandle is used onlyby ?_Helmholtz_3D.

xhandle, yhandle


ipar


dpar


spar

2602


double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_axfloat* for s_Helmholtz_2D/s_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along x-axis.For ?_Helmholtz_2D, the size of the array is ny+1. In caseof Dirichlet boundary condition (value of BCtype[0] is 'D'),it contains values of the function G(ax, yj), j=0, ..., ny. Incase of Neumann boundary condition (value of BCtype[0]is 'N'), it contains values of the function g(ax, yj), j=0, ...,ny. The value corresponding to the index j is placed inbd_ax[j].For ?_Helmholtz_3D, the size of the array is(ny+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[0] is 'D'), it contains values of the functionG(ax, yj, zk), j=0, ..., ny, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[0] is 'N'), it containsthe values of the function g(ax, yj, zk), j=0, ..., ny, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (j, k) is placed inbd_ax[j+k*(ny+1)].

double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_bxfloat* for s_Helmholtz_2D/s_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along x-axis.For ?_Helmholtz_2D, the size of the array is ny+1. In caseof Dirichlet boundary condition (value of BCtype[1] is 'D'),it contains values of the function G(bx, yj), j=0, ..., ny. Incase of Neumann boundary condition (value of BCtype[1]is 'N'), it contains values of the function g(bx, yj), j=0, ...,ny. The value corresponding to the index j is placed inbd_bx[j].For ?_Helmholtz_3D, the size of the array is(ny+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[1] is 'D'), it contains values of the functionG(bx, yj, zk), j=0, ..., ny, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[1] is 'N'), it containsthe values of the function g(bx, yj, zk), j=0, ..., ny, k=0,

2603


..., nz. The values are packed in the array so that the valuecorresponding to indices (j, k) is placed inbd_bx[j+k*(ny+1)].

double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_ayfloat* for s_Helmholtz_2D/s_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along y-axis.For ?_Helmholtz_2D, the size of the array is nx+1. In caseof Dirichlet boundary condition (value of BCtype[2] is 'D'),it contains values of the function G(xi, ay), i=0, ..., nx. Incase of Neumann boundary condition (value of BCtype[2]is 'N'), it contains values of the function g(xi, ay), i=0, ...,nx. The value corresponding to the index i is placed inbd_ay[i].For ?_Helmholtz_3D, the size of the array is(nx+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[2] is 'D'), it contains values of the functionG(xi,ay, zk), i=0, ..., nx, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[2] is 'N'), it containsthe values of the function g(xi,ay, zk), i=0, ..., nx, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (i, k) is placed inbd_ay[i+k*(nx+1)].

double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_byfloat* for s_Helmholtz_2D/s_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along y-axis.For ?_Helmholtz_2D, the size of the array is nx+1. In caseof Dirichlet boundary condition (value of BCtype[3] is 'D'),it contains values of the function G(xi, by), i=0, ..., nx. Incase of Neumann boundary condition (value of BCtype[3]is 'N'), it contains values of the function g(xi, by), i=0, ...,nx. The value corresponding to the index i is placed inbd_by[i].For ?_Helmholtz_3D, the size of the array is(nx+1)*(nz+1). In case of Dirichlet boundary condition(value of BCtype[3] is 'D'), it contains values of the function

2604


G(xi,by, zk), i=0, ..., nx, k=0, ..., nz. In case of Neumannboundary condition (value of BCtype[3] is 'N'), it containsthe values of the function g(xi,by, zk), i=0, ..., nx, k=0,..., nz. The values are packed in the array so that the valuecorresponding to indices (i, k) is placed inbd_by[i+k*(nx+1)].

double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_azfloat* for s_Helmholtz_2D/s_Helmholtz_3D.This parameter is needed only for ?_Helmholtz_3D.Contains values of the boundary condition on the leftmostboundary of the domain along z-axis.The size of the array is (nx+1)*(ny+1). In case of Dirichletboundary condition (value of BCtype[4] is 'D'), it containsvalues of the function G(xi, yj,az), i=0, ..., nx, j=0, ...,ny. In case of Neumann boundary condition (value ofBCtype[4] is 'N'), it contains the values of the function g(xi,yj,az), i=0, ..., nx, j=0, ..., ny. The values are packed inthe array so that the value corresponding to indices (i, j)is placed in bd_az[i+j*(nx+1)].

double* for d_Helmholtz_2D/d_Helmholtz_3D,bd_bzfloat* for s_Helmholtz_2D/s_Helmholtz_3D.This parameter is needed only for ?_Helmholtz_3D.Contains values of the boundary condition on the rightmostboundary of the domain along z-axis.The size of the array is (nx+1)*(ny+1). In case of Dirichletboundary condition (value of BCtype[5] is 'D'), it containsvalues of the function G(xi, yj,bz), i=0, ..., nx, j=0, ...,ny. In case of Neumann boundary condition (value ofBCtype[5] is 'N'), it contains the values of the function g(xi,yj,bz), i=0, ..., nx, j=0, ..., ny. The values are packed inthe array so that the value corresponding to indices (i, j)is placed in bd_bz[i+j*(nx+1)].

NOTE. To avoid wrong computation result, you should not change arrays bd_ax, bd_bx,bd_ay, bd_by, bd_az, bd_bz between a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine and a subsequent call tothe appropriate ?_Helmholtz_2D/?_Helmholtz_3D routine.

2605


Output Parameters

On output, contains the approximate solution to the problempacked the same way as the right-hand side of the problemwas packed on input.

f

Data structures used by Intel MKL DFT interface.xhandle, yhandle

Contains integer data to be used by Fast Helmholtz Solver.Modified on output as explained in Common Parameters.

ipar

Contains double-precision data to be used by Fast HelmholtzSolver. Modified on output as explained in CommonParameters.

dpar

Contains single-precision data to be used by Fast HelmholtzSolver. Modified on output as explained in CommonParameters.

spar


stat

Description

The routines compute the approximate solution of Helmholtz problem defined in the previouscalls to the corresponding initialization and commit routines. The solution is computed accordingto formulas given in the Poisson Library Implemented section. The f parameter, which initiallyholds the packed vector of the right-hand side of the problem, is replaced by the computedsolution packed in the same way. Values of ax, bx, ay, by, az, bz, nx, ny, nz, and BCtype arepassed to each of the routines with the ipar array and defined in the previous call to theappropriate ?_init_Helmholtz_2D/?_init_Helmholtz_3D routine.

Return Values


stat= 1


The routine stopped as division by zero occurred. Itusually happens if the data in the dpar or spar arraywas altered by mistake.

stat= -2

The routine stopped as memory was insufficient tocomplete the computations.

stat= -3

2606



stat= -100


stat= -1000


stat= -10000


stat= -99999

free_Helmholtz_2D/free_Helmholtz_3DCleans the memory allocated for the datastructures used by DFT interface.

Syntax

void free_Helmholtz_2D(DFTI_DESCIPTOR* xhandle, int* ipar, int* stat);

void free_Helmholtz_3D (DFTI_DESCIPTOR* xhandle, DFTI_DESCIPTOR* yhandle,int* ipar, int* stat);

Input Parameters

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). yhandle is used onlyby free_Helmholtz_3D.

xhandle, yhandle


ipar

Output Parameters

Data structures used by Intel MKL DFT interface. Memoryallocated for the structures is released on output.

xhandle, yhandle

Contains integer data to be used by Fast Helmholtz Solver.Status of the routine call is written to ipar[0].

ipar

2607


int*. Routine completion status, which is also written toipar[0].

stat

Description

The routine cleans the memory used by the xhandle and yhandle structures, needed forcalling Intel MKL DFT functions. If you need to release memory allocated for other parameters,you should include the memory cleaning in your code.

Return Values



stat= -1000


stat= -99999

PL Routines for the Spherical Solver

The section gives detailed description of spherical PL routines, their syntax, parameters andvalues they return. All flavors of the same routine, namely, double-precision and single-precision,periodic (having names ending in “p”) and non-periodic (having names ending in “np”), aredescribed together.

These PL routines also call Intel MKL DFT interface (described in section “DFT Functions” inchapter “Fast Fourier Transforms”), which enhances performance of the routines.

?_init_sph_p/?_init_sph_npInitializes basic data structures of the Fast periodicand non-periodic Helmholtz Solver on a sphere.

Syntax

void d_init_sph_p(double* ap, double* at, double* bp, double* bt, int* np,int* nt, double* q, int* ipar, double* dpar, int* stat);

void s_init_sph_np(float* ap, float* at, float* bp, float* bt, int* np, int*nt, float* q, int* ipar, float* spar, int* stat);

void d_init_sph_np(double* ap, double* at, double* bp, double* bt, int* np,int* nt, double* q, int* ipar, double* dpar, int* stat);

2608


void s_init_sph_np(float* ap, float* at, float* bp, float* bt, int* np, int*nt, float* q, int* ipar, float* spar, int* stat);

Input Parameters

double* for d_init_sph_p/d_init_sph_np,apfloat* for s_init_sph_p/s_init_sph_np.The coordinate (angle) of the leftmost boundary of the

domain along φ-axis.

double* for d_init_sph_p/d_init_sph_np,bpfloat* for s_init_sph_p/s_init_sph_np.The coordinate (angle) of the rightmost boundary of the

domain along φ-axis.

double* for d_init_sph_p/d_init_sph_np,atfloat* for s_init_sph_p/s_init_sph_np.The coordinate (angle) of the leftmost boundary of the

domain along θ-axis.

double* for d_init_sph_p/d_init_sph_np,btfloat* for s_init_sph_p/s_init_sph_np.The coordinate (angle) of the rightmost boundary of the

domain along θ-axis.

int*. The number of mesh intervals along φ-axis. Must beeven in the non-periodic case and divisible by 4 in theperiodic case.

np

int*. The number of mesh intervals along θ-axis.nt

double* for d_init_sph_p/d_init_sph_np,qfloat* for s_init_sph_p/s_init_sph_np.The constant Helmholtz coefficient. Note that to solvePoisson problem, you should set the value of q to 0.

Output Parameters

int array of size 128. Contains integer data to be used byFast Helmholtz Solver on a sphere (for details, refer toCommon Parameters).

ipar

2609


double array of size 5*np/2+nt+10. Containsdouble-precision data to be used by Fast Helmholtz Solveron a sphere (for details, refer to Common Parameters).

dpar

float array of size 5*np/2+nt+10. Containssingle-precision data to be used by Fast Helmholtz Solveron a sphere (for details, refer to Common Parameters).

spar


stat

Description

The routines ?_init_sph_p/?_init_sph_np are called to initialize basic data structures forPoisson Library computations of the appropriate precision. All routines invoked after a call toa ?_init_Helmholtz_2D/?_init_Helmholtz_3D routine use values of the ipar, dpar andspar array parameters returned by the routine. Detailed description of the array parameterscan be found in Common Parameters.

WARNING. Data structures initialized and created by periodic/non-periodic flavors ofthe routine cannot be used by non-periodic/periodic flavors of any PL routines forHelmholtz Solver on a sphere, respectively.

You can skip calling this routine in your code. However, see Caveat on Parameter Modificationsbefore doing so.

Return Values

The routine successfully completed the task. Ingeneral, to proceed with computations, the routineshould complete with this stat value.

stat= 0


stat= -99999

2610


?_commit_sph_p/?_commit_sph_npChecks consistency and correctness of user's dataas well as initializes certain data structures requiredto solve periodic/non-periodic Helmholtz problemon a sphere.

Syntax

void d_commit_sph_p(double* f, DFTI_DESCIPTOR* handle_s, DFTI_DESCIPTOR*handle_c, int* ipar, double* dpar, int* stat);

void s_commit_sph_p(float* f, DFTI_DESCIPTOR* handle_s, DFTI_DESCIPTOR*handle_c, int* ipar, float* spar, int* stat);

void d_commit_sph_np(double* f, DFTI_DESCIPTOR* handle, int* ipar, double*dpar, int* stat);

void s_commit_sph_np(float* f, DFTI_DESCIPTOR* handle, int* ipar, float*spar, int* stat);

Input Parameters

double* for d_commit_sph_p/d_commit_sph_np,ffloat* for s_commit_sph_p/s_commit_sph_np.Contains the right-hand side of the problem packed in asingle vector. The size of the vector is (np+1)*(nt+1) andvalue of the right-hand side in the mesh point (i, j) isstored in f[i+j*(np+1)] .Note that the array f may be altered by the routine. Pleasesave this vector in another memory location if you want topreserve it.


ipar


dpar

float array of size 5*np/2+nt+10. Containssingle-precision data to be used by Fast Helmholtz Solveron a sphere (for details, refer to Common Parameters).

spar

2611


Output Parameters

Vector of the right-hand side of the problem. Possibly,altered on output.

f

Contains integer data to be used by Fast Helmholtz Solveron a sphere. Modified on output as explained in CommonParameters.

ipar

Contains double-precision data to be used by Fast HelmholtzSolver on a sphere. Modified on output as explained inCommon Parameters.

dpar

Contains single-precision data to be used by Fast HelmholtzSolver on a sphere. Modified on output as explained inCommon Parameters.

spar

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). handle_s andhandle_c are used only in ?_commit_sph_p and handleis used only in ?_commit_sph_np.

handle_s, handle_c,handle


stat

Description

The routines ?_commit_sph_p/?_commit_sph_np check consistency and correctness of theparameters to be passed to the solver routines ?_sph_p/?_sph_np, respectively. They alsoinitialize certain data structures. ?_commit_sph_p initializes structures handle_s and handle_cand ?_commit_sph_np initializes handle. The routines also initialize arrays ipar anddpar/spar, depending upon the routine precision. Refer to Common Parameters to find outwhich particular array elements the ?_commit_sph_p/?_commit_sph_np routines initializeand what values are written there. The routines perform only a basic check for correctness andconsistency. If you are going to modify parameters of PL routines, see the Caveat on ParameterModifications section. Unlike ?_init_sph_p/?_init_sph_np, you cannot skip calling theseroutines in your code. Values of np and nt are passed to each of the routines with the ipararray and defined in a previous call to the appropriate ?_init_sph_p/?_init_sph_np routine.

2612


Return Values


stat= 1



stat= -100


stat= -1000


stat= -10000


stat= -99999

?_sph_p/?_sph_npComputes the solution of a spherical Helmholtzproblem specified by the parameters.

Syntax

void d_sph_p(double* f, DFTI_DESCIPTOR* handle_s, DFTI_DESCIPTOR* handle_c,int* ipar, double* dpar, int* stat);

void s_sph_p(float* f, DFTI_DESCIPTOR* handle_s, DFTI_DESCIPTOR* handle_c,int* ipar, float* spar, int* stat);

void d_sph_np(double* f, DFTI_DESCIPTOR* handle, int* ipar, double* dpar,int* stat);

void s_sph_np(float* f, DFTI_DESCIPTOR* handle, int* ipar, float* spar, int*stat);

Input Parameters

double* for d_sph_p/d_sph_np,ffloat* for s_sph_p/s_sph_np.

2613


Contains the right-hand side of the problem packed in asingle vector and modified by the appropriate?_commit_sph_p/?_commit_sph_np routine. Note that anattempt to substitute the original right-hand side vector atthis point will result in a wrong solution.The size of the vector is (np+1)*(nt+1) and value of theright-hand side in the mesh point (i, j) is stored inf[i+j*(np+1)] .

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). handle_s andhandle_c are used only in ?_sph_p and handle is usedonly in ?_sph_np.



ipar


dpar

float array of size 5*np/2+nt+10. Contains single-precisiondata to be used by Fast Helmholtz Solver on a sphere (fordetails, refer to Common Parameters).

spar

Output Parameters

On output, contains the approximate solution to the problempacked the same way as the right-hand side of the problemwas packed on input.

f

Data structures used by Intel MKL DFT interface.handle_s, handle_c,handle

Contains integer data to be used by Fast Helmholtz Solveron a sphere. Modified on output as explained in CommonParameters.

ipar

Contains double-precision data to be used by Fast HelmholtzSolver on a sphere. Modified on output as explained inCommon Parameters.

dpar

2614


Contains single-precision data to be used by Fast HelmholtzSolver on a sphere. Modified on output as explained inCommon Parameters.

spar


stat

Description

The routines compute the approximate solution on a sphere of the Helmholtz problem definedin the previous calls to the corresponding initialization and commit routines. The solution iscomputed according to formulas given in the Poisson Library Implemented section. The fparameter, which initially holds the packed vector of the right-hand side of the problem, isreplaced by the computed solution packed in the same way. Values of np and nt are passedto each of the routines with the ipar array and defined in the previous call to the appropriate?_init_sph_p/?_init_sph_np routine.

Return Values


stat= 1


The routine stopped as division by zero occurred. Itusually happens if the data in the dpar or spar arraywas altered by mistake.

stat= -2

The routine stopped as memory was insufficient tocomplete the computations.

stat= -3


stat= -100


stat= -1000


stat= -10000


stat= -99999

2615


free_sph_p/free_sph_npCleans the memory allocated for the datastructures used by DFT interface.

Syntax

void free_sph_p(DFTI_DESCIPTOR* handle_s, DFTI_DESCIPTOR* handle_c, int*ipar, int* stat);

void free_sph_np(DFTI_DESCIPTOR* handle, int* ipar, int* stat);

Input Parameters

DESCIPTOR_HANDLE*. Data structures used by Intel MKLDFT interface (for details, refer to section “DFT Functions”in chapter “Fast Fourier Transforms”). handle_s andhandle_c are used only in free_sph_p and handle is usedonly in free_sph_np.



ipar

Output Parameters

Data structures used by Intel MKL DFT interface. Memoryallocated for the structures is released on output.


Contains integer data to be used by Fast Helmholtz Solveron a sphere. Status of the routine call is written to ipar[0].

ipar

int*. Routine completion status, which is also written toipar[0].

stat

Description

The routine cleans the memory used by the handle_s, handle_c or handle structures, neededfor calling Intel MKL DFT functions. If you need to release memory allocated for other parameters,you should include the memory cleaning in your code.

Return Values


2616



stat= -1000


stat= -99999

Common Parameters

This section provides description of array parameters ipar, dpar and spar, which hold PLroutine options in both Cartesian and spherical cases.

NOTE. Initial values are assigned to the array parameters by the appropriate?_init_Helmholtz_2D/?_init_Helmholtz_3D/?_init_sph_p/?_init_sph_np and?_commit_Helmholtz_2D/?_commit_Helmholtz_3D/?_commit_sph_p/?_commit_sph_nproutines.

int array of size 128, holds integer data needed for Fast HelmholtzSolver (both for Cartesian and spherical coordinate systems). Itselements are described in Table 13-6:

ipar

Table 13-6 Elements of the ipar Array

DescriptionIndex

Contains status value of the last called PL routine. In general, it should be 0 toproceed with Fast Helmholtz Solver. The element has no predefined values. Thiselement can also be used to inform the

0

?_commit_Helmholtz_2D/?_commit_Helmholtz_3D/?_commit_sph_p/?_commit_sph_nproutines of how the Commit step of the computation (see Figure 13-3) should becarried out. Non-zero value of ipar[0] with decimal representation

where each of a, b, and c is equal to 0 or 9, indicates that some parts of theCommit step should be omitted. If c=9, the routine omits checking of parametersand initialization of the data structures. If b=9, then in the Cartesian case, theroutine omits the adjustment of the right-hand side vector f to the Neumannboundary condition (multiplication of boundary values by 0.5 as well asincorporation of the boundary function g) and/or Dirichlet boundary condition(setting boundary values to 0 as well as incorporation of the boundary function

2617


DescriptionIndex

G). In this case, the routine also omits the adjustment of the right-hand side vectorf to the particular boundary functions. For the Helmholtz solver on a sphere, theroutine omits computation of the spherical weights for the dpar/spar array. Ifa=9, then the routine omits the normalization of the right-hand side vector f. Inthe 2D Cartesian case, it is the multiplication by hy2, where hy is the mesh sizein the y direction (for details, see Poisson Library Implemented). In the 3D(Cartesian) case, it is the multiplication by hz2, where hz is the mesh size in thez direction. For Helmholtz solver on a sphere, it is the multiplication by hθ

2, where

hθ is the mesh size in the θ direction (for details, see Poisson Library Implemented).Using ipar[0] you can adjust the routine to your needs and gain efficiency insolving multiple Helmholtz problems that differ only in the right-hand side. Youmust be cautious using this opportunity, as misunderstanding of the commit processmay result in wrong results or program failure (see also Caveat on ParameterModifications).

Contains error messaging options:1

• ipar[1]=-1 indicates that all error messages will be printed to the fileMKL_Poisson_Library_log.txt in the folder from which the routine is called. Ifthe file does not exist, the routine tries to create it. If the attempt fails, theroutine prints information that the file cannot be created to the standard outputdevice.

• ipar[1]=0 indicates that no error messages will be printed.• ipar[1]=1 is the default value. It indicates that all error messages will be

printed to the preconnected default output device (usually, screen).

In case of errors, the stat parameter will acquire a non-zero value regardless ofthe ipar[1] setting.

Contains warning messaging options:2

• ipar[2]=-1 indicates that all warning messages will be printed to the fileMKL_Poisson_Library_log.txt in the directory from which the routine is called.If the file does not exist, the routine tries to create it. If the attempt fails, theroutine prints information that the file cannot be created to the standard outputdevice.

• ipar[2]=0 indicates that no warning messages will be printed.• ipar[2]=1 is the default value. It indicates that all warning messages will be

printed to the preconnected default output device (usually, screen).

2618


DescriptionIndex

In case of warnings, the stat parameter will acquire a non-zero value regardlessof the ipar[2] setting.

Contains the number of the combination of boundary conditions. In the Cartesiancase, it corresponds to the value that the BCtype parameter holds:

3

• In the 2D case,

0 corresponds to 'DDDD'

1 corresponds to 'DDDN'

…

15 corresponds to 'NNNN'

• In the 3D case,

0 corresponds to 'DDDDDD'

1 corresponds to 'DDDDDN'

...

63 corresponds to 'NNNNNN'.

Helmholtz solver on a sphere uses this parameter only in a periodic case. The bpand bt parameters of the ?_init_sph_p/?_init_sph_np routine, which initializesipar[3], determine its value:

• 0 corresponds to the problem without poles.• 1 corresponds to the problem on the entire sphere.

Parameters 4 through 9 are used only in Cartesian case.

Takes the value of 1 if BCtype[0]='N', 0 if BCtype[0]='D', and -1 otherwise.4




Takes the value of 1 if BCtype[4]='N', 0 if BCtype[4]='D', and -1 otherwise.This parameter is used only in the 3D case.

8

2619


DescriptionIndex

Takes the value of 1 if BCtype[5]='N', 0 if BCtype[5]='D', and -1 otherwise.This parameter is used only in the 3D case.

9

Takes the value of nx, that is, the number of intervals along x-axis in the Cartesian

case, and the value of np, that is, the number of intervals along φ-axis in thespherical case.

10

Takes the value of ny, that is, the number of intervals along y-axis in the Cartesian

case, and the value of nt, that is, the number of intervals along θ-axis in thespherical case.

11

Takes the value of nz, the number of intervals along z-axis. This parameter isused only in the 3D (Cartesian) case.

12

Takes the value of 6, which specifies the internal partitioning of the dpar/spararray.

13

Takes the value of ipar[13]+ipar[10]+1, which specifies the internal partitioningof the dpar/spar array.

14

Subsequent values of ipar depend upon the dimension of the problem or upon whether thesolver on a sphere is periodic.

Non-periodic casePeriodic case3D case2D case

Takes the value of ipar[14]+1, which specifies the internalpartitioning of the dpar/spar array.

Unused15

Takes the value of ipar[14]+ ipar[11]+1, which specifies theinternal partitioning of the dpar/spar array.

Unused16

Takes the value of ipar[16]+1, which specifies the internalpartitioning of the dpar/spar array.

Takes the value ofipar[14]+1, whichspecifies the internalpartitioning of thedpar/spar array.

17

2620


DescriptionIndex

Takes the valueof ipar[16]+3*ipar[10]

Takes the value ofipar[16]+3*ipar[10]/4+1,

Takes the valueof ipar[16]+3*ipar[10]/2+1,

Takes the value ofipar[14]+3*ipar[10]/2+1, which specifiesthe internalpartitioning of thedpar/spar array.

18

/2+1, whichspecifies theinternalpartitioning of thedpar/spar array.

which specifies theinternal partitioningof the dpar/spararray.

which specifiesthe internalpartitioning of thedpar/spar array.

UnusedTakes the value of ipar[18]+1, whichspecifies the internal partitioning of thedpar/spar array.

Unused19

UnusedTakes the value ofipar[18]+3*ipar[10]/4+1, which specifies

Takes the valueof ipar[18]+3*ipar[11]/2+1,

Unused20

the internalpartitioning of thedpar/spar array.

which specifiesthe internalpartitioning of thedpar/spar array.

Subsequent values of ipar are assigned regardless.

Contains message style options. If ipar[21]=0, then PL routines print all errorand warning messages in Fortran-style notations. If ipar[21]=1, then PL routinesprint the messages in C-style notations. The default value is 1.

21

Contains the number of threads to be used for computations in a multithreadedenvironment. The default value is 1.

22

Unused in the current implementation of the Poisson Library.23through39

Contain the first twenty elements of the ipar array of the first TrigonometricTransform that the Solver uses. (For details, see Common Parameters in the“Trigonometric Transform Routines” chapter.)

40through59

Contain the first twenty elements of the ipar array of the second TrigonometricTransform that the 3D and periodic solvers use. (For details, see CommonParameters in the “Trigonometric Transform Routines” chapter.)

60through79

2621


NOTE. You may declare the ipar array in your code as int ipar[80]. However, forcompatibility with later versions of Intel MKL Poisson Library, which may require moreipar values, it is highly recommended to declare ipar as int ipar[128].

Arrays dpar and spar are similar to each other and differ only in the data precision:

Holds data needed for double-precision Fast Helmholtz Solvercomputations.

dpar

• For the Cartesian solver, double array of size 5*nx/2+7 in the 2Dcase or 5*(nx+ny)/2+9 in the 3D case; initialized in thed_init_Helmholtz_2D/d_init_Helmholtz_3D andd_commit_Helmholtz_2D/d_commit_Helmholtz_3D routines.

• For the spherical solver, double array of size 5*np/2+nt+10;initialized in the d_init_sph_p/d_init_sph_np andd_commit_sph_p/d_commit_sph_np routines.

Holds data needed for single-precision Fast Helmholtz Solvercomputations.

spar

• For the Cartesian solver, float array of size 5*nx/2+7 in the 2Dcase or 5*(nx+ny)/2+9 in the 3D case; initialized in thes_init_Helmholtz_2D/s_init_Helmholtz_3D ands_commit_Helmholtz_2D/s_commit_Helmholtz_3D routines.

• For the spherical solver, float array of size 5*np/2+nt+10;initialized in the s_init_sph_p/s_init_sph_np ands_commit_sph_p/s_commit_sph_np routines.

As dpar and spar have similar elements in respective positions, the elements are describedtogether in Table 13-7:

Table 13-7 Elements of the dpar and spar Arrays

DescriptionIndex

In the Cartesian case, contains the length of the interval along x-axis rightafter a call to the ?_init_Helmholtz_2D/?_init_Helmholtz_3D routineor the mesh size hx in the x direction (for details, see Poisson LibraryImplemented) after a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine.

0

2622


DescriptionIndex

In the spherical case, contains the length of the interval along φ-axis rightafter a call to the ?_init_sph_p/?_init_sph_np routine or the mesh size

hφ in the φ direction (for details, see Poisson Library Implemented) after acall to the ?_commit_sph_p/?_commit_sph_np routine.

In the Cartesian case, contains the length of the interval along y-axis rightafter a call to the ?_init_Helmholtz_2D/?_init_Helmholtz_3D routineor the mesh size hy in the y direction (for details, see Poisson LibraryImplemented) after a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine.

1

In the spherical case, contains the length of the interval along θ-axis rightafter a call to the ?_init_sph_p/?_init_sph_np routine or the mesh size

hθ in the θ direction (for details, see Poisson Library Implemented) after acall to the ?_commit_sph_p/?_commit_sph_np routine.

In the Cartesian case, contains the length of the interval along z-axis rightafter a call to the ?_init_Helmholtz_2D/?_init_Helmholtz_3D routineor the mesh size hz in the z direction (for details, see Poisson Library

2

Implemented) after a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine. In theCartesian solver, this parameter is used only in the 3D case.

In the spherical solver, contains the coordinate of the leftmost boundaryalong θ-axis after a call to the ?_init_sph_p/?_init_sph_np routine.

Contains the value of the coefficient q after a call to the?_init_Helmholtz_2D/?_init_Helmholtz_3D/?_init_sph_p/?_init_sph_nproutine.

3

Contains the tolerance parameter after a call to the?_init_Helmholtz_2D/?_init_Helmholtz_3D/?_init_sph_p/?_init_sph_nproutine.

4

In the Cartesian case, this value is used only for the pure Neumann boundaryconditions ( BCtype="NNNN" in the 2D case; BCtype="NNNNNN" in the 3Dcase). This is a special case, as the right-hand side of the problem cannotbe arbitrary if the coefficient q is zero. Poisson Library verifies that theclassical solution exists (up to rounding errors) using this tolerance. In anycase, Poisson Library computes the normal solution, that is, the solution

2623


DescriptionIndex

that has the minimal Euclidean norm of residual. Nevertheless, the?_Helmholtz_2D/?_Helmholtz_3D routine informs the user that thesolution may not exist in a classical sense (up to rounding errors).

In the spherical solver, the value is used for the special case of a periodicproblem on the entire sphere. This special case is similar to the abovedescribed Cartesian case with pure Neumann boundary conditions. So, herePoisson Library computes the normal solution as well. The parameter is alsoused to detect if the problem is periodic up to rounding errors.

The default value for this parameter is 1.0E-10 in case of double-precisioncomputations or 1.0E-4 in case of single-precision computations. The usermay increase the value of the tolerance, for instance, to avoid the warningsthat may appear.

In the Cartesian case, contain the spectrum of the 1D problem along x-axisafter a call to the ?_commit_Helmholtz_2D/?_commit_Helmholtz_3Droutine. In the spherical case, contains the spectrum of the 1D problemalong φ-axis after a call to the ?_commit_sph_p/?_commit_sph_np routine.

ipar[13]-1throughipar[14]-1

In the Cartesian case, contain the spectrum of the 1D problem along y-axisafter a call to the ?_commit_Helmholtz_2D/?_commit_Helmholtz_3Droutine. These elements are used only in the 3D case. In the spherical case,contains the spherical weights after a call to the?_commit_sph_p/?_commit_sph_np routine.

ipar[15]-1throughipar[16]-1

Take the values of the (staggered) sine/cosine in the mesh points:ipar[17]-1throughipar[18]-1

• along x-axis after a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine for aCartesian solver

• along φ-axis after a call to the ?_commit_sph_p/?_commit_sph_nproutine for a spherical solver.

Take the values of the (staggered) sine/cosine in the mesh points:ipar[19]-1throughipar[20]-1

• along y-axis after a call to the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine for aCartesian 3D solver

• along φ-axis after a call to the ?_commit_sph_p routine for a sphericalperiodic solver.

2624


DescriptionIndex

These elements are used neither in the 2D Cartesian case nor in thenon-periodic spherical case.

NOTE. You may define the array size depending upon the type of the problem to solve.

Caveat on Parameter Modifications

Flexibility of PL interface makes it possible to skip calling the?_init_Helmholtz_2D/?_init_Helmholtz_3D/?_init_sph_p/?_init_sph_np routine andinitialize the basic data structures explicitly in your code. You may also need to modify contentsof ipar, dpar and spar arrays after initialization. When doing so, you should provide correctand consistent data in the arrays. Mistakenly altered arrays cause errors or wrong computation.You can perform basic check for correctness and consistency of parameters by calling the?_commit_Helmholtz_2D/?_commit_Helmholtz_3D routine but it does not ensure the correctsolution, it only reduces the chance of errors or wrong result.

NOTE. To supply correct and consistent parameters to PL routines, you should haveconsiderable experience in using PL interface and good understanding of the solutionprocess as well as elements that the ipar, spar and dpar arrays contain anddependencies between values of these elements.

However, in rare occurrences, even advanced users may fail in tuning parameters for the FastHelmholtz Solver.

WARNING. The only way that ensures a proper solution of a Helmholtz problem is tofollow a typical sequence of invoking the routines and not change the default set ofparameters. So, avoid modifications of ipar, dpar and spar arrays unless a strong needarises.

2625



Several aspects of the Intel MKL PL interface are platform-specific and language-specific. Topromote portability across platforms and ease of use across different languages, users areprovided with Intel MKL PL language-specific header files to include in their code. Currently,the following of them are available:

• mkl_poisson.h, to be used together with mkl_dfti.h, for C programs.

• mkl_poisson.f90, to be used toghether with mkl_dfti.f90, for Fortran-90 programs.

NOTE. Use of the Intel MKL PL software without including one of the above header filesis not supported.

The include files define function prototypes for appropriate languages.

C-specific Header File

The C-specific header file defines the following function prototypes for the Cartesian solver:

void d_init_Helmholtz_2D(double*, double*, double*, double*, int*, int*,char*, double*, int*, double*, int*);

void d_commit_Helmholtz_2D(double*, double*, double*, double*, double*,DFTI_DESCRIPTOR_HANDLE*, int*, double*, int*);

void d_Helmholtz_2D(double*, double*, double*, double*, double*,DFTI_DESCRIPTOR_HANDLE*, int*, double*, int*);

void s_init_Helmholtz_2D(float*, float*, float*, float*, int*, int*, char*,float*, int*, float*, int*);

void s_commit_Helmholtz_2D(float*, float*, float*, float*, float*,DFTI_DESCRIPTOR_HANDLE*, int*, float*, int*);

2626


void s_Helmholtz_2D(float*, float*, float*, float*, float*,DFTI_DESCRIPTOR_HANDLE*, int*, float*, int*);

void free_Helmholtz_2D(DFTI_DESCRIPTOR_HANDLE*, int*, int*);

void d_init_Helmholtz_3D(double*, double*, double*, double*, double*, double*,int*, int*, int*, char*, double*, int*, double*, int*);

void d_commit_Helmholtz_3D(double*, double*, double*, double*, double*,double*, double*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*,double*, int*);

void d_Helmholtz_3D(double*, double*, double*, double*, double*, double*,double*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*, double*,int*);

void s_init_Helmholtz_3D(float*, float*, float*, float*, float*, float*,int*, int*, int*, char*, float*, int*, float*, int*);

void s_commit_Helmholtz_3D(float*, float*, float*, float*, float*, float*,float*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*, float*,int*);

void s_Helmholtz_3D(float*, float*, float*, float*, float*, float*, float*,DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*, float*, int*);

void free_Helmholtz_3D(DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*,int*, int*);

The C-specific header file defines the following function prototypes for the spherical solver:

void d_init_sph_p(double*, double*, double*, double*, int*, int*, double*,int*, double*, int*);

void d_commit_sph_p(double*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*,int*, double*, int*);

void d_sph_p(double*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*,double*, int*);

void s_init_sph_p(float*, float*, float*, float*, int*, int*, float*, int*,float*, int*);

void s_commit_sph_p(float*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*,int*, float*, int*);

void s_sph_p(float*, DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*,float*, int*);

void free_sph_p(DFTI_DESCRIPTOR_HANDLE*, DFTI_DESCRIPTOR_HANDLE*, int*,int*);

void d_init_sph_np(double*, double*, double*, double*, int*, int*, double*,int*, double*, int*);

2627


void d_commit_sph_np(double*, DFTI_DESCRIPTOR_HANDLE*, int*, double*, int*);

void d_sph_np(double*, DFTI_DESCRIPTOR_HANDLE*, int*, double*, int*);

void s_init_sph_np(float*, float*, float*, float*, int*, int*, float*, int*,float*, int*);

void s_commit_sph_np(float*, DFTI_DESCRIPTOR_HANDLE*, int*, float*, int*);

void s_sph_np(float*, DFTI_DESCRIPTOR_HANDLE*, int*, float*, int*);

void free_sph_np(DFTI_DESCRIPTOR_HANDLE*, int*, int*);

Fortran-Specific Header File

The Fortran90-specific header file defines the following function prototypes for the Cartesiansolver:

SUBROUTINE D_INIT_HELMHOLTZ_2D (AX, BX, AY, BY, NX, NY, BCTYPE, Q, IPAR,DPAR, STAT)

USE MKL_DFTI

INTEGER NX, NY, STAT

INTEGER IPAR(*)

DOUBLE PRECISION AX, BX, AY, BY, Q

DOUBLE PRECISION DPAR(*)

CHARACTER(4) BCTYPE

END SUBROUTINE

SUBROUTINE D_COMMIT_HELMHOLTZ_2D (F, BD_AX, BD_BX, BD_AY, BD_BY, XHANDLE,IPAR, DPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

DOUBLE PRECISION F(IPAR(11)+1,*)


2628


DOUBLE PRECISION BD_AX(*), BD_BX(*), BD_AY(*), BD_BY(*)

TYPE(DFTI_DESCRIPTOR), POINTER :: XHANDLE

END SUBROUTINE

SUBROUTINE D_HELMHOLTZ_2D (F, BD_AX, BD_BX, BD_AY, BD_BY, XHANDLE, IPAR,DPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)



DOUBLE PRECISION BD_AX(*), BD_BX(*), BD_AY(*), BD_BY(*)


END SUBROUTINE

SUBROUTINE S_INIT_HELMHOLTZ_2D (AX, BX, AY, BY, NX, NY, BCTYPE, Q, IPAR,SPAR, STAT)

USE MKL_DFTI

INTEGER NX, NY, STAT

INTEGER IPAR(*)

REAL AX, BX, AY, BY, Q

REAL SPAR(*)

CHARACTER(4) BCTYPE

END SUBROUTINE

2629


SUBROUTINE S_COMMIT_HELMHOLTZ_2D (F, BD_AX, BD_BX, BD_AY, BD_BY, XHANDLE,IPAR, SPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL F(IPAR(11)+1,*)

REAL SPAR(*)

REAL BD_AX(*), BD_BX(*), BD_AY(*), BD_BY(*)


END SUBROUTINE

SUBROUTINE S_HELMHOLTZ_2D (F, BD_AX, BD_BX, BD_AY, BD_BY, XHANDLE, IPAR,SPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)


REAL SPAR(*)

REAL BD_AX(*), BD_BX(*), BD_AY(*), BD_BY(*)


END SUBROUTINE

SUBROUTINE FREE_HELMHOLTZ_2D (XHANDLE, IPAR, STAT)

USE MKL_DFTI

2630


INTEGER STAT

INTEGER IPAR(*)


END SUBROUTINE

SUBROUTINE D_INIT_HELMHOLTZ_3D (AX, BX, AY, BY, AZ, BZ, NX, NY, NZ, BCTYPE,Q, IPAR, DPAR, STAT)

USE MKL_DFTI

INTEGER NX, NY, NZ, STAT

INTEGER IPAR(*)

DOUBLE PRECISION AX, BX, AY, BY, AZ, BZ, Q


CHARACTER(6) BCTYPE

END SUBROUTINE

SUBROUTINE D_COMMIT_HELMHOLTZ_3D (F, BD_AX, BD_BX, BD_AY, BD_BY, BD_AZ,BD_BZ, XHANDLE, YHANDLE, IPAR, DPAR, STAT)

USE MKL_DFTI

2631


INTEGER STAT

INTEGER IPAR(*)

DOUBLE PRECISION F(IPAR(11)+1,IPAR(12)+1,*)


DOUBLE PRECISION BD_AX(IPAR(12)+1,*), BD_BX(IPAR(12)+1,*),BD_AY(IPAR(11)+1,*)

DOUBLE PRECISION BD_BY(IPAR(11)+1,*), BD_AZ(IPAR(11)+1,*),BD_BZ(IPAR(11)+1,*)

TYPE(DFTI_DESCRIPTOR), POINTER :: XHANDLE, YHANDLE

END SUBROUTINE

SUBROUTINE D_HELMHOLTZ_3D (F, BD_AX, BD_BX, BD_AY, BD_BY, BD_AZ, BD_BZ,XHANDLE, YHANDLE, IPAR, DPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

DOUBLE PRECISION F(IPAR(11)+1,IPAR(12)+1,*)


DOUBLE PRECISION BD_AX(IPAR(12)+1,*), BD_BX(IPAR(12)+1,*),BD_AY(IPAR(11)+1,*)

DOUBLE PRECISION BD_BY(IPAR(11)+1,*), BD_AZ(IPAR(11)+1,*),BD_BZ(IPAR(11)+1,*)


END SUBROUTINE

SUBROUTINE S_INIT_HELMHOLTZ_3D (AX, BX, AY, BY, AZ, BZ, NX, NY, NZ, BCTYPE,Q, IPAR, SPAR, STAT)

USE MKL_DFTI

2632


INTEGER NX, NY, NZ, STAT

INTEGER IPAR(*)

REAL AX, BX, AY, BY, AZ, BZ, Q

REAL SPAR(*)

CHARACTER(6) BCTYPE

END SUBROUTINE

SUBROUTINE S_COMMIT_HELMHOLTZ_3D (F, BD_AX, BD_BX, BD_AY, BD_BY, BD_AZ,BD_BZ, XHANDLE, YHANDLE, IPAR, SPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL F(IPAR(11)+1,IPAR(12)+1,*)

REAL SPAR(*)

REAL BD_AX(IPAR(12)+1,*), BD_BX(IPAR(12)+1,*), BD_AY(IPAR(11)+1,*)

REAL BD_BY(IPAR(11)+1,*), BD_AZ(IPAR(11)+1,*), BD_BZ(IPAR(11)+1,*)


END SUBROUTINE

SUBROUTINE S_HELMHOLTZ_3D (F, BD_AX, BD_BX, BD_AY, BD_BY, BD_AZ, BD_BZ,XHANDLE, YHANDLE, IPAR, SPAR, STAT)

USE MKL_DFTI

2633


INTEGER STAT

INTEGER IPAR(*)

REAL F(IPAR(11)+1,IPAR(12)+1,*)

REAL SPAR(*)

REAL BD_AX(IPAR(12)+1,*), BD_BX(IPAR(12)+1,*), BD_AY(IPAR(11)+1,*)

REAL BD_BY(IPAR(11)+1,*), BD_AZ(IPAR(11)+1,*), BD_BZ(IPAR(11)+1,*)


END SUBROUTINE

SUBROUTINE FREE_HELMHOLTZ_3D (XHANDLE, YHANDLE, IPAR, STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)


END SUBROUTINE

The Fortran90-specific header file defines the following function prototypes for the sphericalsolver:

SUBROUTINE D_INIT_SPH_P(AP,BP,AT,BT,NP,NT,Q,IPAR,DPAR,STAT)

USE MKL_DFTI

INTEGER NP, NT, STAT

INTEGER IPAR(*)

DOUBLE PRECISION AP,BP,AT,BT,Q


END SUBROUTINE

2634


SUBROUTINE D_COMMIT_SPH_P(F,HANDLE_S,HANDLE_C,IPAR,DPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)



TYPE(DFTI_DESCRIPTOR), POINTER :: HANDLE_C, HANDLE_S

END SUBROUTINE

SUBROUTINE D_SPH_P(F,HANDLE_S,HANDLE_C,IPAR,DPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)




END SUBROUTINE

SUBROUTINE S_INIT_SPH_P(AP,BP,AT,BT,NP,NT,Q,IPAR,SPAR,STAT)

USE MKL_DFTI


INTEGER IPAR(*)

REAL AP,BP,AT,BT,Q

REAL SPAR(*)

2635


END SUBROUTINE

SUBROUTINE S_COMMIT_SPH_P(F,HANDLE_S,HANDLE_C,IPAR,SPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL SPAR(*)



END SUBROUTINE

SUBROUTINE S_SPH_P(F,HANDLE_S,HANDLE_C,IPAR,SPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL SPAR(*)



END SUBROUTINE

SUBROUTINE FREE_SPH_P(HANDLE_S,HANDLE_C,IPAR,STAT)

USE MKL_DFTI

2636


INTEGER STAT

INTEGER IPAR(*)

TYPE(DFTI_DESCRIPTOR), POINTER :: HANDLE_S, HANDLE_C

END SUBROUTINE

SUBROUTINE D_INIT_SPH_NP(AP,BP,AT,BT,NP,NT,Q,IPAR,DPAR,STAT)

USE MKL_DFTI


INTEGER IPAR(*)

DOUBLE PRECISION AP,BP,AT,BT,Q


END SUBROUTINE

SUBROUTINE D_COMMIT_SPH_NP(F,HANDLE,IPAR,DPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)



TYPE(DFTI_DESCRIPTOR), POINTER :: HANDLE

END SUBROUTINE

SUBROUTINE D_SPH_NP(F,HANDLE,IPAR,DPAR,STAT)

USE MKL_DFTI

2637


INTEGER STAT

INTEGER IPAR(*)




END SUBROUTINE

SUBROUTINE S_INIT_SPH_NP(AP,BP,AT,BT,NP,NT,Q,IPAR,SPAR,STAT)

USE MKL_DFTI


INTEGER IPAR(*)

REAL AP,BP,AT,BT,Q

REAL SPAR(*)

END SUBROUTINE

SUBROUTINE S_COMMIT_SPH_NP(F,HANDLE,IPAR,SPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL SPAR(*)

REALF(IPAR(11)+1,*)


END SUBROUTINE

2638


SUBROUTINE S_SPH_NP(F,HANDLE,IPAR,SPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)

REAL SPAR(*)



END SUBROUTINE

SUBROUTINE FREE_SPH_NP(HANDLE,IPAR,STAT)

USE MKL_DFTI

INTEGER STAT

INTEGER IPAR(*)


END SUBROUTINE

Fortran-90 specifics of the PL routines usage are similar for all Intel MKL PDE support tools anddescribed in the Calling PDE Support Routines from Fortran-90 section.

Calling PDE Support Routines from Fortran-90The calling interface for all Intel MKL TT and PL routines is designed to be easily used in C.However, any of the TT and PL routines can be invoked directly from Fortran-90 if users arefamiliar with the inter-language calling conventions of their platforms.

Neither TT nor PL interface can be invoked from Fortran-77 due to restrictions imposed by theuse of Intel MKL DFT interface.

The inter-language calling conventions include, but are not limited to, the argument passingmechanisms for the language, the data type mappings from C to Fortran-90 and how C externalnames are decorated on the platform.

2639


To promote portability and relieve a user of dealing with the calling conventions specifics,Fortran-90 header file mkl_trig_transforms.f90 for TT routines and mkl_poisson.f90 for PLroutines, used together with mkl_dfti.f90, declare a set of macros and introduce type definitionsintended to hide the inter-language calling conventions and provide an interface to the routinesthat looks natural in Fortran-90.

For example, consider a hypothetical library routine, foo, which takes a double-precision vectorof length n. C users would access such a function as follows:

int n;

double *x;

…

foo(x, &n);

As noted above, to invoke foo, Fortran-90 users would need to know what Fortran-90 datatypes correspond to C types int and double (or float in case of single-precision), whatargument-passing mechanism the C compiler uses and what, if any, name decoration isperformed by the C compiler when generating the external symbol foo.However, with theFortran-90 header files mkl_trig_transforms.f90 / mkl_poisson.f90 and mkl_dfti.f90 included,the invocation of foo within a Fortran-90 program will look as follows:

• For TT interface,use mkl_dfti

use mkl_trig_transforms

INTEGER n

DOUBLE PRECISION, ALLOCATABLE :: x

…

CALL FOO(x,n)

• For PL interface,use mkl_dfti

use mkl_poisson

INTEGER n

DOUBLE PRECISION, ALLOCATABLE :: x

…

CALL FOO(x,n)

2640


Note that in the above example, the header files mkl_trig_transforms.f90 / mkl_poisson.f90and mkl_dfti.f90 provide a definition for the subroutine FOO. To ease the use of PL or TT routinesin Fortran-90, the general approach of providing Fortran-90 definitions of names is usedthroughout the libraries. Specifically, if a name from a PL or TT interface is documented ashaving the C-specific name foo, then the Fortran-90 header files provide an appropriateFortran-90 language type definition FOO.

One of the key differences between Fortran-90 and C is the language argument-passingmechanism: C programs use pass-by-value semantics and Fortran-90 programs usepass-by-reference semantics. The Fortran-90 headers ensure proper treatment of this difference.In particular, in the above example, the header files mkl_trig_transforms.f90 / mkl_poisson.f90and mkl_dfti.f90 hide the difference by defining a macro FOO that takes the address of theappropriate arguments.

2641


14Optimization Solvers Routines

Intel® Math Kernel Library provides tools for solving optimization problems. These tools are routines forsolving nonlinear least squares problems through the Trust-Region (TR) algorithms.

Intel MKL Optimization solver routines that can be used for:

– solving nonlinear least-squares problems without constraints

– solving nonlinear least-squares problems with boundary constraints

– computing Jacobi matrix by central differences for solving nonlinear least-square problem.

The first two groups of routines are designed to find only local minimum point. However problems canhave multiple local minima and trying different initial points is recommended for better solutions.

For more information on terms and key concepts required to understand the use of the Intel MKL nonlinearleast-squares problem solver routines, see Appendix F, “Optimization Solvers Basics”.

Routines described below are subdivided according to their purpose as follows:

Nonlinear Least-Squares Problem without Constraints

Nonlinear Least-Squares Problem with Linear (Boundary) Constraints

Jacobi Matrix Calculation Routines.

Organization and ImplementationThe TR solvers have RCI-based interfaces. The RCI TR interface implements a group of user-callableroutines that are used in the step-by-step solving process for nonlinear least square problem andfollows the general RCI scheme described in [Dong95]. RCI means that user is supposed to performcertain operations for the solver (for example, to provide values of the objective function or Jacobi

2643

matrix). When the solver needs the results of such operations, the user should pass them tothe solver. This makes the solver independent of specific implementation of the operations.However, this approach requires some additional work by the user.

Figure 14-1 Typical order for invoking RCI solver routines

The Trust-Region solvers implement with OpenMP* support. For using multiprocessing mode,set the number of threads in environment variable OMP_NUM_THREADS.

Memory Allocation and Handles

To make the routines easy to use, the user is not required to allocate temporary workingstorage. Any required memory is allocated by the solver. To allow multiple users to access thesolver simultaneously, the solver keeps track of the storage allocated for a particular applicationby using an opaque data object called a handle. Each of the Intel RCI TR solver routine eithercreates, uses or deletes a handle.

Handle declaration varies from language to language. This document declares it as to be of_TRNSP_HANDLE_t or _TRNSPBC_HANDLE_t type.

2644


C and C++ programmers should declare a handle as:

#include "mkl_rci.h"

_TRNSP_HANDLE_t handle;

or

_TRNSPBC_HANDLE_t handle;

Fortran programmers using compilers that support eight byte integers should declare a handleas:

INCLUDE "mkl_rci.fi"

INTEGER*8 handle

In addition to the necessary definition for the correct declaration of a handle, the include filealso defines the following:

• function prototypes for languages that support prototypes

• symbolic constants that are used for the error returns

• user options for the solver routines

• message severity.

Nonlinear Least-Squares Problem without ConstraintsThe nonlinear least squares problem without constraints can be described as follows:

where F(x) : Rn → R is a twice differentiable function in Rn. Solving of nonlinear least squaresproblem is searching for the best approximation to vector y with model function fi(x), whichhas nonlinear dependence on variables х. The best approximation means that the sum ofsquares of residuals yi - fi(x) is the lowest possible.

Table 14-1 RCI TR Routines

OperationRoutine Name

Initializes the solver.dtrnlsp_init

2645

Optimization Solvers Routines 14

Solves a nonlinear least-squares problem on the basis of RCIusing the Trust-Region algorithm.

dtrnlsp_solve

Retrieves the number of iterations, stop criterion, initialresidual, and final residual.

dtrnlsp_get

Removes data.dtrnlsp_delete

dtrnlsp_initInitializes the solver of nonlinear least squareproblem.

Syntax

Fortran:

res = dtrnlsp_init(handle, n, m, x, eps, iter1, iter2, rs)

C:

res = dtrnlsp_init(&handle, &n, &m, x, eps, &iter1, &iter2, &rs);

Description

The routine initializes the solver. After initialization all subsequent invocations of dtrnlsp_solveroutine should use the values of the handle returned by dtrnlsp_init.

The eps array contains the stopping tests:

eps(1): Δk < eps(1)

eps(2): ||F(x)|| < eps(2)

eps(3): ||A(x)ij|| < eps(3)

eps(4): ||s|| < eps(4)

eps(5): ||F(x)||- ||F(x) - A(x)s|| < eps(5)

eps(6): trial step precision. If eps(6) = 0, then eps(6) = 1.d-10,

where A is a Jacobi matrix.

2646

Input Parameters

INTEGER. Length of X.n

INTEGER. Length of F(x).m

DOUBLE PRECISION. Array of size n. Initial guess.x

DOUBLE PRECISION. Array of size 6; contains stopping tests. See thevalues in Description.

eps

INTEGER. Specifies the maximum number of iterations.iter1

INTEGER. Specifies the maximum number of iterations of trial-stepcalculation.

iter2

DOUBLE PRECISION. Positive input variable used in determining theinitial step bound. In most cases the factor should lie within the interval(0.1, 100.0). The generally recommended value is 100.

rs

Output Parameters

Data object of _TRNSP_HANDLE_t type for C/C++ programmers andINTEGER*8 for FORTRAN programmers.

handle

INTEGER. Informs about the task completion.resres = TR_SUCCESS means the routine completed the task normally.res = TR_INVALID_OPTION means an error in the input parameters.res = TR_OUT_OF_MEMORY means a memory error.

dtrnlsp_solveSolves a nonlinear least-squares problem usingTrust-Region algorithm.

Syntax

Fortran:

res = dtrnlsp_solve(handle, fvec, fjac, RCI_Request)

C:

res = dtrnlsp_solve(&handle, fvec, fjac, &RCI_Request);

2647


Description

The dtrnlsp_solve routine based on Reverse Communication Interface (RCI) uses theTrust-Region algorithm to solve nonlinear least squares problems. The problem is stated asfollows:

where m ≥ n, F : Rn → Rm and fi(x) is the i-th component function of F(x). From a currentpoint, the algorithm uses the trust region approach:

to get the new point xn as solution to the following problem:

where s is the trial step, and ||s||2 ≤ Δc

then xn = xc + s.

The RCI_Request parameter informs about the task completion and may have the followingvalues:

RCI_Request= 2 - calculate the Jacobian matrix and put the result into fjac.

RCI_Request= 1 - recalculate the function at vector X and put the result into fvec.

RCI_Request= 0 - successful completion of the task

RCI_Request= -1 - the algorithm has exceeded the maximal number of iterations.

RCI_Request= -2 - Δk < eps(1)

RCI_Request= -3 - ||F(x)|| < eps(2)

2648

RCI_Request= -4 - ||A(x)ij|| < eps(3)

RCI_Request= -5 - ||s|| < eps(4)

RCI_Request= -6 -||F(x)||- ||F(x) - A(x)s|| < eps(5),


Input Parameters


handle

DOUBLE PRECISION. Array of size m. Contains the function values atX, where fvec(i) = (yi – fi(x)).

fvec

DOUBLE PRECISION. Array of size (m,n). Contains the Jacobi matrix ofthe function.

fjac

Output Parameters

DOUBLE PRECISION. Array of size m. Contains the updated functionvalues at X

fvec

INTEGER. Informs about the task completion. When equal to 0, thetask is completed succefully.

RCI_Request

See Description for the other values of the parameter and their meaning.

INTEGER. Informs about the task completion.resres = TR_SUCCESS means the routine completed the task normally.

dtrnlsp_getRetrieves the number of iterations, stop criterion,initial residual, and final residual.

Syntax

Fortran:

res = dtrnlsp_get(handle, iter, st_cr, r1, r2)

C:

res = dtrnlsp_get(&handle, &iter, &st_cr, &r1, &r2);

2649

Description

The routine retrieves the number of current iterations, stop criterion, initial residual, and finalresidual.

The st_cr parameter contains the stop criterion:

st_cr = 1 - the algorithm has exceeded the maximal number of iterations.

st_cr = 2 - Δk < eps(1)

st_cr = 3 - ||F(x)|| < eps(2)

st_cr = 4 - ||A(x)ij|| < eps(3)

st_cr = 5 - ||s|| < eps(4)

st_cr = 6 - ||F(x)||- ||F(x) - A(x)s|| < eps(5),


Input Parameters


handle

Output Parameters

INTEGER. Contains the current number of iterations.iter

INTEGER. Contains the stop criterion.st_crSee Description for the parameter values and their meanings.

DOUBLE PRECISION. Contains the initial residual, that is, the functionalvalue (||y - f(x)||) of the initial х set by the user.

r1

DOUBLE PRECISION. Contains the final residual, that is, the functionalvalue (||y - f(x)||) of the final х resulting from the algorithmoperation.

r2

INTEGER. Informs about task completion.resres = TR_SUCCESS means the routine completed the task normally.

2650

dtrnlsp_deleteRemoves data object required by TR solver.

Syntax

Fortran:

res = dtrnlsp_delete(handle)

C:

res = dtrnlsp_delete(&handle);

Description

The routine removes a data object needed for the RCI TR solver.

Input Parameters


handle

Output Parameters


2651


Examples of dtrnlsp Usage

Example 14-1. dtrnlsp Usage in FortranC** NONLINEAR LEAST SQUARE PROBLEM WITHOUT BOUNDARY CONSTRAINTS

PROGRAM EXAMPLE_DTRNLSP_POWELL

IMPLICIT NONE

C** HEADER-FILE WITH DEFINITIONS (CONSTANTS, EXTERNALS)

INCLUDE 'mkl_rci.fi'

C** USER’S OBJECTIVE FUNCTION

EXTERNAL EXTENDET_POWELL

C** N - NUMBER OF FUNCTION VARIABLES

INTEGER N

PARAMETER (N = 40)

C** M - DIMENSION OF FUNCTION VALUE

INTEGER M

PARAMETER (M = 40)

C** SOLUTION VECTOR. CONTAINS VALUES X FOR F(X)

DOUBLE PRECISION X (N)

C** PRECISIONS FOR STOP-CRITERIA (SEE MANUAL FOR MORE DETAILS)

DOUBLE PRECISION ESP (6)

C** JACOBI CALCULATION PRECISION

DOUBLE PRECISION JAC_EPS

C** REVERSE COMMUNICATION INTERFACE PARAMETER

INTEGRER RCI_REQUEST

C** FUNCTION (F(X)) VALUE VECTOR

DOUBLE PRECISION FVEC (M)

C** JACOBI MATRIX

DOUBLE PRECISION FJAC (M, N)

2652


C** NUMBER OF ITERATIONS

INTEGRER ITER

C** NUMBER OF STOP-CRITERION

INTEGRER ST_CR

C** CONTROLS OF RCI CYCLE

INTEGRER SUCCESSFUL

C** MAXIMUM NUMBER OF ITERATIONS

INTEGRER ITER1

C** MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OF TRIAL-STEP

INTEGRER ITER2

C** INITIAL STEP BOUND

DOUBLE PRECISION RS

C** INITIAL AND FINAL RESIDUALS

DOUBLE PRECISION R1, R2

C** TR SOLVER HANDLE

INTEGRER*8 HANDLE

C** CYCLE’S COUNTERS

INTEGRER I, J

C** SET PRECISIONS FOR STOP-CRITERIA

EPS (1:6) = 1.D-5

C** SET MAXIMUM NUMBER OF ITERATIONS

ITER1 = 1000

C** SET MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OF TRIAL-STEP

ITER2 = 100

C** SET INITIAL STEP BOUND

RS = 100.D0

C** PRECISIONS FOR JACOBI CALCULATION

2653


JAC_EPS = 1.D-8

C** SET THE INITIAL GUESS

DO I = 1, N/4

X (4*I - 3) = 3.D0

X (4*I - 2) = -1.D0

X (4*I - 1) = 0.D0

X (4*I) = 1.D0

ENDDO

C** SET INITIAL VALUES

DO I = 1, M

FVEC (I) = 0.D0

DO J = 1, N

FJAC (I, J) = 0.D0

ENDDO

ENDDO

C** INITIALIZE SOLVER (ALLOCATE MEMORY, SET INITIAL VALUES)

C** HANDLE IN/OUT: TR SOLVER HANDLE

C** N IN: NUMBER OF FUNCTION VARIABLES

C** M IN: DIMENSION OF FUNCTION VALUE

C** X IN: SOLUTION VECTOR. CONTAINS VALUES X FOR F(X)

C** EPS IN: PRECISIONS FOR STOP-CRITERIA

C** ITER1 IN: MAXIMUM NUMBER OF ITERATIONS

C** ITER2 IN: MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OFTRIAL-STEP

C** RS IN: INITIAL STEP BOUND

IF (DTRNLSP_INIT (HANDLE, N, M, X, EPS, ITER1, ITER2, RS)

+ /= TR_SUCCESS) THEN

C** IF FUNCTION DOES NOT COMPLETE SUCCESSFULLY THEN PRINT ERROR MESSAGE

PRINT *, '| ERROR IN DTRNLSP_INIT'

2654


C** AND STOP

STOP

ENDIF

C** SET INITIAL RCI CYCLE VARIABLES

RCI_REQUEST = 0

SUCCESSFUL = 0

C** RCI CYCLE

DO WHILE (SUCCESSFUL == 0)

C** CALL TR SOLVER


C** FVEC IN: VECTOR

C** FJAC IN: JACOBI MATRIX

C** RCI_REQUEST IN/OUT: RETURN NUMBER THAT DENOTES NEXT STEP FORPERFORMING

IF (DTRNLSP_SOLVE (HANDLE, FVEC, FJAC, RCI_REQUEST)



PRINT *, '| ERROR IN DTRNLSP_SOLVE'

C** AND STOP

STOP

ENDIF


C** ACCORDING TO RCI_REQUEST VALUE WE DO NEXT STEP

SELECT CASE (RCI_REQUEST)

CASE (-1, -2, -3, -4, -5, -6)

C** STOP RCI CYCLE

SUCCESSFUL = 1

CASE (1)

2655


C** RECALCULATE FUNCTION VALUE



C** X IN: SOLUTION VECTOR

C** FVEC OUT: FUNCTION VALUE F(X)

CALL EXTENDET_POWELL (M, N, X, FVEC)

CASE (2)

C** COMPUTE JACOBI MATRIX

C** EXTENDET_POWELL IN: EXTERNAL OBJECTIVE FUNCTION



C** FJAC OUT: JACOBI MATRIX


C** JAC_EPS IN: JACOBI CALCULATION PRECISION

IF (DJACOBI (EXTENDET_POWELL, N, M, FJAC, X, JAC_EPS)



PRINT *, '| ERROR IN DTRNLSP_SOLVE'

C** AND STOP

STOP

ENDIF

ENDSELECT

ENDDO

C** GET SOLUTION STATUSES

C** HANDLE IN: TR SOLVER HANDLE

C** ITER OUT: NUMBER OF ITERATIONS

C** ST_CR OUT: NUMBER OF STOP CRITERION

C** R1 OUT: INITIAL RESIDUALS

2656


C** R2 OUT: FINAL RESIDUALS

IF (DTRNLSP_GET (HANDLE, ITER, ST_CR, R1_R2)



PRINT *, '| ERROR IN DTRNLSP_GET'

C** AND STOP

STOP

ENDIF

C** FREE HANDLE MEMORY

IF (DTRNLSP_DELETE (HANDLE) /= TR_SUCCESS) THEN


PRINT *, '| ERROR IN DTRNLSP_DELETE'

C** AND STOP

STOP

ENDIF

C** IF FINAL RESIDUAL IS LESS THAN REQUIRED PRECISION THEN PRINT PASS

IF (R2 < 1.D-5) THEN

PRINT *, '| DTRNLSP POWELL............PASS'!, R1, R2

C** ELSE PRINT FAILED

ELSE

PRINT *, '| DTRNLSP POWELL............FAILED'!, R1,R2

ENDIF

END PROGRAM EXAMPLE_DTRNLSP_POWELL

C** ROUTINE FOR EXTENDET POWELL FUNCTION CALCULATION


2657


C** X IN: VECTOR FOR FUNCTION CALCULATION

C** F OUT: FUNCTION VALUE F(X)

SUBROUTINE EXTENDET_POWELL (M, N, X, F)

IMPLICIT NONE

INTEGER M, N

DOUBLE PRECISION X (*), F (*)

INTEGER I

DO I = 1, N/4

F (4*I-3) = X(4*I - 3) + 10.D0 * X(4*I - 2)

F (4*I-2) = 2.2360679774997896964091736687313D0*(X(4*I-1) -X(4*I))

F (4*I-1) = (X(4*I-2) - 2.D0*X(4*I-1))**2

F (4*I) = 3.1622776601683793319988935444327D0*(X(4*I-3) -X(4*I))**2

ENDDO

ENDSUBROUTINE EXTENDET_POWELL

2658


Example 14-2. dtrnlsp Usage in C#include <stdio.h>

#include <malloc.h>

#include <math.h>


/* nonlinear least square problem without boundary constraints */

int main ()

{

/* user’s objective function */

extern void extendet_powell (int *, int *, double*, double*);

/* n - number of function variables

m - dimension of function value */

int n = 4, m = 4;

/* precisions for stop-criteria (see manual for more details) */

double eps[6];

/* solution vector. contains values x for f(x) */

double *x;

/* iter1 - maximum number of iterations

iter2 - maximum number of iterations of calculation of trial-step */

int iter1 = 1000, iter2 = 100;

/* initial step bound */

double rs = 0.0;

/* reverse communication interface parameter */

int RCI_Request; // reverse communication interfacevariable

/* controls of rci cycle */

int successful;

2659

/* function (f(x)) value vector */

double *fvec;

/* jacobi matrix */

double *fjac;

/* number of iterations */

int iter;

/* number of stop-criterion */

int st_cr;

/* initial and final residuals */

double r1, r2;

/* TR solver handle */

_TRNSP_HANDLE_t handle; // TR solver handle

/* cycle’s counter */

int i;

/* memory allocation */

x = (double*) malloc (sizeof (double)*n);

fvec = (double*) malloc (sizeof (double)*m);

fjac = (double*) malloc (sizeof (double)*m*n);

/* set precisions for stop-criteria */

for (i = 0; i < 6; i++)

{

eps [i] = 0.00001;

}

/* set the initial guess */

for (i = 0; i < n/4; i++)

{

x [4*i] = 3.0;

2660

x [4*i + 1] = -1.0;

x [4*i + 2] = 0.0;

x [4*i + 3] = 1.0;

}

/* set the initial values */

for (i = 0; i < m; i++)

fvec [i] = 0.0;

for (i = 0; i < m*n; i++)

fjac [i] = 0.0;

/* initialize solver (allocate memory, set initial values)

handle in/out: TR solver handle

n in: number of function variables

m in: dimension of function value

x in: solution vector. contains values x for f(x)

eps in: precisions for stop-criteria

iter1 in: maximum number of iterations

iter2 in: maximum number of iterations of calculation oftrial-step

rs in: initial step bound */

if (dtrnlsp_init (&handle, &n, &m, x, eps, &iter1, &iter2, &rs) !=TR_SUCCESS)

{

/* if function does not complete successfully then print error message*/

printf ("| error in dtrnlsp_init\n");

/* and exit */

return 0;

}

/* set initial rci cycle variables */

RCI_Request = 0;

2661

successful = 0;

/* rci cycle */

while (successful == 0)

{

/* call tr solver

handle in/out: tr solver handle

fvec in: vector

fjac in: jacobi matrix

RCI_request in/out: return number which denotes next step forperforming */

if (dtrnlsp_solve (&handle, fvec, fjac, &RCI_Request) != TR_SUCCESS)

{

/* if function does not complete successfully then print errormessage */

printf ("| error in dtrnlsp_solve\n");

/* and exit */

return 0;

}

/* according to rci_request value we do next step */

if (RCI_Request == -1 ||

RCI_Request == -2 ||




RCI_Request == -6)

/* exit rci cycle */

successful = 1;

if (RCI_Request == 1)

{

2662


/* recalculate function value



x in: solution vector

fvec out: function value f(x) */

extendet_powell (&m, &n, x, fvec);

}


{

/* compute jacobi matrix

extendet_powell in: external objective function



fjac out: jacobi matrix

2663



jac_eps in: jacobi calculation precision */

if (djacobi (extendet_powell, &n, &m, fjac, x, eps) != TR_SUCCESS)

{

/* if function does not complete successfully then printerror message */

printf ("| error in djacobi\n");

/* and exit */

return 0;

}

}

}

/* get solution statuses

handle in: TR solver handle

iter out: number of iterations

st_cr out: number of stop criterion

r1 out: initial residuals

r2 out: final residuals */

if (dtrnlsp_get (&handle, &iter, &st_cr, &r1, &r2) != TR_SUCCESS)

{


printf ("| error in dtrnlsp_get\n");

/* and exit */

return 0;

}

/* free handle memory */

if (dtrnlsp_delete (&handle) != TR_SUCCESS)

{

2664


printf ("| error in dtrnlsp_delete\n");

/* and exit */

return 0;

}

/* free allocated memory */

free (x);

free (fvec);

free (fjac);

/* if final residual is less than required precision then print pass */

if (r2 < 0.00001)

printf ("| dtrnlsp powell............PASS\n");

/* else print failed */

else

printf ("| dtrnlsp powell............FAILED\n");

return 0;

}

/* nonlinear system equations without constraints */

/* routine for extendet powell function calculation



x in: vector for function calculation

f out: function value f(x) */

void extendet_powell (int *m, int *n, double *x, double *f)

{

int i;

2665

for (i = 0; i < (*n)/4; i++)

{

f [4*i] = x [4*i] + 10.0*x [4*i + 1];

f [4*i + 1] = 2.2360679774997896964091736687313*(x [4*i + 2] - x [4*i+ 3]);

f [4*i + 2] = (x [4*i + 1] - 2.0*x [4*i + 2])*(x [4*i + 1] - 2.0*x[4*i + 2]);

f [4*i + 3] = 3.1622776601683793319988935444327*(x [4*i] - x [4*i +3])*(x [4*i] - x [4*i + 3]);

}

return;

}

Nonlinear Least-Squares Problem with Linear (Bound)Constraints

The nonlinear least squares problem with linear bound constraints can be described and solvedin the same way as the nonlinear least squares problem without constraints but it has thefollowing constraints:

Table 14-2 RCI TR Routines for Problem with Bound Constraints


Initializes the solver.dtrnlspbc_init

Solves a nonlinear least-squares problem on the basis of RCIusing the Trust-Region algorithm.

dtrnlspbc_solve

Retrieves the number of iterations, stop criterion, initialresidual, and final residual.

dtrnlspbc_get

Removes data.dtrnlspbc_delete

2666

dtrnlspbc_initInitializes the solver of nonlinear least squareproblem with linear (boundary) constraints.

Syntax

Fortran:

res = dtrnlspbc_init(handle, n, m, x, LW, UP, eps, iter1, iter2, rs)

C:

res = dtrnlspbc_init(&handle, &n, &m, x, LW, UP, eps, &iter1, &iter2, &rs);

Description

The routine initializes the solver. After initialization all subsequent invocations ofdtrnlspbc_solve routine should use the values of the handle returned by dtrnlspbc_init.

The eps array contains the stopping tests:

eps(1): Δk < eps(1)

eps(2): ||F(x)|| < eps(2)

eps(3): ||A(x)ij|| < eps(3)

eps(4): ||s|| < eps(4)

eps(5): ||F(x)||- ||F(x) - A(x)s|| < eps(5)

eps(6): trial step precision. If eps(6) = 0, then eps(6) = 1.d-10,


Input Parameters


INTEGER. Length of F(x).m

DOUBLE PRECISION. Array of size n. Initial guess.x

DOUBLE PRECISION. Array of size n.LW

Contains low bounds for x (lwi ≤ xi).

DOUBLE PRECISION. Array of size n.UP

Contains upper bounds for x (upi ≤ xi).

2667

DOUBLE PRECISION. Array of size 6; contains stopping tests. See thevalues in Description.

eps

INTEGER. Specifies the maximum number of iterations.iter1

INTEGER. Specifies the maximum number of iterations of trial-stepcalculation.

iter2

DOUBLE PRECISION. Positive input variable used in determining theinitial step bound. In most cases the factor should lie within the interval(0.1, 100.0). The generally recommended value is 100.

rs

Output Parameters

Data object of _TRNSPBC_HANDLE_t type for C/C++ programmers andINTEGER*8 for FORTRAN programmers.

handle


dtrnlspbc_solveSolves a nonlinear least-squares problem withlinear (bound) constraints using Trust-Regionalgorithm.

Syntax

Fortran:

res = dtrnlspbc_solve(handle, fvec, fjac, RCI_Request)

C:

res = dtrnlspbc_solve(&handle, fvec, fjac, &RCI_Request);

Description

The dtrnlspbc_solve routine based on Reverse Communication Interface (RCI) uses theTrust-Region algorithm to solve nonlinear least squares problems with linear (bound) constraints.The problem is stated as follows:

2668


where m ≥ n, F : Rn → Rm and fi(x) is the i-th component function of F(x).

The RCI_Request parameter informs about the task completion and may have the followingvalues:

RCI_Request= 2 - calculate the Jacobian matrix and put the result into fjac.

RCI_Request= 1 - recalculate the function at vector X and put the result into fvec.

RCI_Request= 0 - successful completion of the task

RCI_Request= -1 - the algorithm has exceeded the maximal number of iterations.

RCI_Request= -2 - Δk < eps(1)

RCI_Request= -3 - ||F(x)|| < eps(2)

RCI_Request= -4 - ||A(x)ij|| < eps(3)

RCI_Request= -5 - ||s|| < eps(4)

RCI_Request= -6 -||F(x)||- ||F(x) - A(x)s|| < eps(5),


Input Parameters


handle

DOUBLE PRECISION. Array of size m. Contains the function values atX, where fvec(i) = (yi – fi(x)).

fvec

DOUBLE PRECISION. Array of size m by n. Contains the Jacobi matrixof the function.

fjac

Output Parameters

DOUBLE PRECISION. Array of size m. Contains the updated functionvalues at X

fvec


RCI_Request

2669

See Description for the other values of the parameter and their meaning.

INTEGER. Informs about the task completion.resres = TR_SUCCESS means the routine completed the task normally.

dtrnlspbc_getRetrieves the number of iterations, stop criterion,initial residual, and final residual.

Syntax

Fortran:

res = dtrnlspbc_get(handle, iter, st_cr, r1, r2)

C:

res = dtrnlspbc_get(&handle, &iter, &st_cr, &r1, &r2);

Description

The routine retrieves the number of current iterations, stop criterion, initial residual, and finalresidual.

The st_cr parameter contains the stop criterion:

st_cr = 1 - the algorithm has exceeded the maximal number of iterations.

st_cr = 2 - Δk < eps(1)

st_cr = 3 - ||F(x)|| < eps(2)

st_cr = 4 - ||A(x)ij|| < eps(3)

st_cr = 5 - ||s|| < eps(4)

st_cr = 6 - ||F(x)||- ||F(x) - A(x)s|| < eps(5),


Input Parameters


handle

2670

Output Parameters

INTEGER. Contains the current number of iterations.iter

INTEGER. Contains the stop criterion.st_crSee Description for the parameter values and their meanings.

DOUBLE PRECISION. Contains the initial residual, that is, the functionalvalue (||y - f(x)||) of the initial х set by the user.

r1

DOUBLE PRECISION. Contains the final residual, that is, the functionalvalue (||y - f(x)||) of the final х resulting from the algorithmoperation.

r2


dtrnlspbc_deleteRemoves data object required by TR solver.

Syntax

Fortran:

res = dtrnlspbc_delete(handle)

C:

res = dtrnlspbc_delete(&handle);

Description

The routine removes a data object needed for the RCI TR solver.

Input Parameters


handle

Output Parameters


2671


Examples of dtrnlspbc Usage

Example 14-3. dtrnlspbc Usage in FortranC** NONLINEAR LEAST SQUARE PROBLEM WITH BOUNDARY CONSTRAINTS

PROGRAM EXAMPLE_DTRNLSPBC_POWELL

IMPLICIT NONE

C** HEADER-FILE WITH DEFINITIONS (CONSTANTS, EXTERNALS)

INCLUDE 'mkl_rci.f'

C** USER’S OBJECTIVE FUNCTION



INTEGER N

PARAMETER (N = 40)


INTEGER M

PARAMETER (M = 40)


DOUBLE PRECISION X (N)

C** PRECISIONS FOR STOP-CRITERIA (SEE MANUAL FOR MORE DETAILS)

DOUBLE PRECISION ESP (6)

C** JACOBI CALCULATION PRECISION

DOUBLE PRECISION JAC_EPS

C** LOWER AND UPPER BOUNDS

DOUBLE PRECISION LW (N), UP (N)

C** REVERSE COMMUNICATION INTERFACE PARAMETER

INTEGRER RCI_REQUEST


DOUBLE PRECISION FVEC (M)

2672


C** JACOBI MATRIX

DOUBLE PRECISION FJAC (M, N)

C** NUMBER OF ITERATIONS

INTEGRER ITER

C** NUMBER OF STOP-CRITERION

INTEGRER ST_CR


INTEGRER SUCCESSFUL

C** MAXIMUM NUMBER OF ITERATIONS

INTEGRER ITER1

C** MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OF TRIAL-STEP

INTEGRER ITER2

C** INITIAL STEP BOUND

DOUBLE PRECISION RS

C** INITIAL AND FINAL RESIDUALS

DOUBLE PRECISION R1, R2

C** TR SOLVER HANDLE

INTEGRER*8 HANDLE

C** CYCLE’S COUNTERS

INTEGRER I, J

C** SET PRECISIONS FOR STOP-CRITERIA

EPS (1:6) = 1.D-5

C** SET MAXIMUM NUMBER OF ITERATIONS

ITER1 = 1000

C** SET MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OF TRIAL-STEP

ITER2 = 100

C** SET INITIAL STEP BOUND

RS = 100.D0

2673


C** PRECISIONS FOR JACOBI CALCULATION

JAC_EPS = 1.D-8

C** SET THE INITIAL GUESS

DO I = 1, N/4

X (4*I - 3) = 3.D0

X (4*I - 2) = -1.D0

X (4*I - 1) = 0.D0

X (4*I) = 1.D0

ENDDO

C** SET LOWER AND UPPER BOUNDS

DO I = 1, N/4

LW(4*I-3) = 0.1D0

LW(4*I-2) = -20.D0

LW(4*I-1) = -1.D0

LW(4*I) = -1.D0

UP(4*I-3) = 100.D0

2674


UP(4*I-2) = 20.D0

UP(4*I-1) = 1.D0

UP(4*I) = 50.D0

ENDDO

C** SET INITIAL VALUES

DO I = 1, M

FVEC (I) = 0.D0

DO J = 1, N

FJAC (I, J) = 0.D0

ENDDO

ENDDO





C** X IN: SOLUTION VECTOR. CONTAINS VALUES X FOR F(X)

C** LW IN: LOWER BOUND

C** UP IN: UPPER BOUND

C** EPS IN: PRECISIONS FOR STOP-CRITERIA

C** ITER1 IN: MAXIMUM NUMBER OF ITERATIONS

C** ITER2 IN: MAXIMUM NUMBER OF ITERATIONS OF CALCULATION OFTRIAL-STEP

C** RS IN: INITIAL STEP BOUND

IF (DTRNLSPBC_INIT (HANDLE, N, M, X, LW, UP, EPS, ITER1, ITER2

+ , RS) /= TR_SUCCESS) THEN


PRINT *, '| ERROR IN DTRNLSPBC_INIT'

C** AND STOP

STOP

2675


ENDIF


RCI_REQUEST = 0

SUCCESSFUL = 0

C** RCI CYCLE


C** CALL TR SOLVER


C** FVEC IN: VECTOR

C** FJAC IN: JACOBI MATRIX


IF (DTRNLSPBC_SOLVE (HANDLE, FVEC, FJAC, RCI_REQUEST)



PRINT *, '| ERROR IN DTRNLSPBC_SOLVE'

C** AND STOP

STOP

ENDIF


C** ACCORDING TO RCI_REQUEST VALUE WE DO NEXT STEP

SELECT CASE (RCI_REQUEST)

CASE (-1, -2, -3, -4, -5, -6)

C** STOP RCI CYCLE

SUCCESSFUL = 1

CASE (1)

C** RECALCULATE FUNCTION VALUE


2676




C** FVEC OUT: FUNCTION VALUE F(X)

CALL EXTENDET_POWELL (M, N, X, FVEC)

CASE (2)

C** COMPUTE JACOBI MATRIX










PRINT *, '| ERROR IN DTRNLSPBC_SOLVE'

C** AND STOP

STOP

ENDIF

ENDSELECT

ENDDO

C** GET SOLUTION STATUSES

C** HANDLE IN: TR SOLVER HANDLE

C** ITER OUT: NUMBER OF ITERATIONS

C** ST_CR OUT: NUMBER OF STOP CRITERION

C** R1 OUT: INITIAL RESIDUALS

C** R2 OUT: FINAL RESIDUALS

IF (DTRNLSPBC_GET (HANDLE, ITER, ST_CR, R1_R2)

2677


PRINT *, '| ERROR IN DTRNLSPBC_GET'

C** AND STOP

STOP

ENDIF


IF (DTRNLSPBC_DELETE (HANDLE) /= TR_SUCCESS) THEN


PRINT *, '| ERROR IN DTRNLSPBC_DELETE'

C** AND STOP

STOP

ENDIF

C** IF FINAL RESIDUAL IS LESS THAN REQUIRED PRECISION THEN PRINT PASS

IF (R2 < 1.D-1) THEN

PRINT *, '| DTRNLSPBC POWELL............PASS'!, R1,R2

C** ELSE PRINT FAILED

ELSE

PRINT *, '| DTRNLSPBC POWELL............FAILED'!, R1,R2

ENDIF

END PROGRAM EXAMPLE_DTRNLSPBC_POWELL




2678




IMPLICIT NONE

INTEGER M, N


INTEGER I

DO I = 1, N/4

F (4*I-3) = X(4*I - 3) + 10.D0 * X(4*I - 2)

F (4*I-2) = 2.2360679774997896964091736687313D0*(X(4*I-1) -X(4*I))

F (4*I-1) = (X(4*I-2) - 2.D0*X(4*I-1))**2

F (4*I) = 3.1622776601683793319988935444327D0*(X(4*I-3) -X(4*I))**2

ENDDO


2679


Example 14-4. dtrnlspbc Usage in C#include <stdio.h>

#include <malloc.h>

#include <math.h>


/* nonlinear least square problem with boundary constraints */

int main ()

{


extern void extendet_powell (int *, int *, double*, double*);



int n = 4, m = 4;

/* precisions for stop-criteria (see manual for more details) */

double eps[6];

/* solution vector. contains values x for f(x) */

double *x;

/* iter1 - maximum number of iterations

iter2 - maximum number of iterations of calculation of trial-step */

int iter1 = 1000, iter2 = 100;

/* initial step bound */

double rs = 0.0;

/* reverse communication interface parameter */

int RCI_Request;

2680

int successful;

/* function (f(x)) value vector */

double *fvec;

/* jacobi matrix */

double *fjac;

/* lower and upper bounds */

double *LW, *UP;

/* number of iterations */

int iter;

/* number of stop-criterion */

int st_cr;

/* initial and final residuals */

double r1, r2;

/* TR solver handle */

_TRNSPBC_HANDLE_t handle;

/* cycle’s counter */

int i;

/* memory allocation */

x = (double*) malloc (sizeof (double)*n);

fvec = (double*) malloc (sizeof (double)*m);

fjac = (double*) malloc (sizeof (double)*m*n);

LW = (double*) malloc (sizeof (double)*n);

UP = (double*) malloc (sizeof (double)*n);

/* set precisions for stop-criteria */

for (i = 0; i < 6; i++)

{

2681

eps [i] = 0.00001;

}

/* set the initial guess */

for (i = 0; i < n/4; i++)

{

x [4*i] = 3.0;

x [4*i + 1] = -1.0;

x [4*i + 2] = 0.0;

x [4*i + 3] = 1.0;

}

/* set the initial values */

for (i = 0; i < m; i++)

fvec [i] = 0.0;

for (i = 0; i < m*n; i++)

fjac [i] = 0.0;

/* set bounds */

for (i = 0; i < n/4; i++)

{

LW [4*i] = 0.1;

LW [4*i + 1] = -20.0;

LW [4*i + 2] = -1.0;

LW [4*i + 3] = -1.0;

UP [4*i] = 100.0;

UP [4*i + 1] = 20.0;

UP [4*i + 2] = 1.0;

UP [4*i + 3] = 50.0;

}

/* initialize solver (allocate memory, set initial values)

2682

handle in/out: TR solver handle



x in: solution vector. contains values x for f(x)

LW in: lower bound

UP in: upper bound

eps in: precisions for stop-criteria

iter1 in: maximum number of iterations

iter2 in: maximum number of iterations of calculation oftrial-step

rs in: initial step bound */

if (dtrnlspbc_init (&handle, &n, &m, x, LW, UP, eps, &iter1, &iter2,&rs) != TR_SUCCESS)

{


printf ("| error in dtrnlspbc_init\n");

/* and exit */

return 0;

}


RCI_Request = 0;

successful = 0;

/* rci cycle */

while (successful == 0)

{

/* call tr solver

handle in/out: tr solver handle

fvec in: vector

fjac in: jacobi matrix

2683


RCI_request in/out: return number which denotes next step forperforming */

if (dtrnlspbc_solve (&handle, fvec, fjac, &RCI_Request) != TR_SUCCESS)

{

/* if function does not complete successfully then print errormessage */

printf ("| error in dtrnlspbc_solve\n");

/* and exit */

return 0;

}

/* according to rci_request value we do next step */

if (RCI_Request == -1 ||





RCI_Request == -6)


successful = 1;


{

/* recalculate function value




fvec out: function value f(x) */

extendet_powell (&m, &n, x, fvec);

}


2684


{

/* compute jacobi matrix





2685




if (djacobi (extendet_powell, &n, &m, fjac, x, eps) != TR_SUCCESS)

{

/* if function does not complete successfully then printerror message */

printf ("| error in djacobi\n");

/* and exit */

return 0;

}

}

}

/* get solution statuses

handle in: TR solver handle

iter out: number of iterations

st_cr out: number of stop criterion

r1 out: initial residuals

r2 out: final residuals */

if (dtrnlspbc_get (&handle, &iter, &st_cr, &r1, &r2) != TR_SUCCESS)

{


printf ("| error in dtrnlspbc_get\n");

/* and exit */

return 0;

}


if (dtrnlspbc_delete (&handle) != TR_SUCCESS)

{

2686


printf ("| error in dtrnlspbc_delete\n");

/* and exit */

return 0;

}

/* free allocated memory */

free (x);

free (fvec);

free (fjac);

free (LW);

free (UP);

/* if final residual is less than required precision then print pass */

if (r2 < 0.1)

printf ("| dtrnlspbc powell............PASS\n");

/* else print failed */

else

printf ("| dtrnlspbc powell............FAILED\n");

return 0;

}

/* nonlinear system equations with constraints */







{

2687

int i;

for (i = 0; i < (*n)/4; i++)

{

f [4*i] = x [4*i] + 10.0*x [4*i + 1];

f [4*i + 1] = 2.2360679774997896964091736687313*(x [4*i + 2] - x [4*i+ 3]);

f [4*i + 2] = (x [4*i + 1] - 2.0*x [4*i + 2])*(x [4*i + 1] - 2.0*x[4*i + 2]);

f [4*i + 3] = 3.1622776601683793319988935444327*(x [4*i] - x [4*i +3])*(x [4*i] - x [4*i + 3]);

}

return;

}

Jacobi Matrix Calculation RoutinesThis section describes routines that compute Jacobi matrix by central differences. Jacobi matrixcalculation is required while solving nonlinear least-square problem and systems of nonlinearequations (with or without linear bound constraints).Routines for calculation of Jacobi matrixhave "Black-Box" interfaces, where users put the objective function via parameters. But in thatcase the user’s objective function must have fixed interface.

Table 14-3 Jacobi Matrix Calculation Routines


Initializes the solver.djacobi_init

Computes a Jacobi matrix of the function on the basis of RCIusing central difference.

djacobi_solve

Removes data.djacobi_delete

Computes a Jacobi matrix of the fcn function using centraldifference.

djacobi

2688

djacobi_initInitializes the solver of Jacobian calculations.

Syntax

Fortran:

res = djacobi_init(handle, n, m, x, a, esp)

C:

res = djacobi_init(&handle, &n, &m, x, a, &eps);

Description

The routine initializes the solver.

Input Parameters


INTEGER. Length of F.m

DOUBLE PRECISION. Array of size n. Vector at which the function isevaluated.

x

DOUBLE PRECISION. Precision of Jacobi matrix calculation.eps

Output Parameters

Data object of _JACOBIMATRIX_HANDLE_t type for C/C++ programmersand INTEGER*8 for FORTRAN programmers.

handle


a


2689


djacobi_solveComputes Jacobi matrix of the function on the basisof RCI using central difference.

Syntax

Fortran:

res = djacobi_solve(handle, f1, f2, RCI_Request)

C:

res = djacobi_solve(&handle, f1, f2, &RCI_Request);

Description

The djacobi_solve routine computes a Jacobi matrix of the function on the basis of RCI usingcentral difference.

Input Parameters


handle

Output Parameters

DOUBLE PRECISION. Array of size m. Contains the updated function

values at X + ε.f1

DOUBLE PRECISION. Array of size m. Contains the updated function

values at X - ε.f2


RCI_Request

RCI_Request= 1 means user should calculate the Jacobian matrix andput the result into f1.RCI_Request= 2 - means user should calculate the Jacobian matrixand put the result into f2.

INTEGER. Informs about the task completion.resres = TR_SUCCESS means the routine completed the task normally.res = TR_INVALID_OPTION means an error in the input parameters.

2690


djacobi_deleteRemoves data object required by Jacobiancalculation.

Syntax

Fortran:

res = djacobi_delete(handle)

C:

res = djacobi_delete(&handle);

Description

The routine removes a data object needed for the Jacobi matrix RCI solver.

Input Parameters


handle

Output Parameters


djacobiComputes Jacobi matrix of user's objective functionusing cenral difference.

Syntax

Fortran:

res = djacobi(fcn, n, m, fjac, x, jac_eps)

C:

res = djacobi(fcn, &n, &m, fjac, x, &jac_eps);

2691


Description

The routine computes a Jacobi matrix for function fcn using central difference. This routinehas "Black-Box" interface, where user inputs the objective function via parameters. But in thatcase the user’s objective function must have a fixed interface.

Input Parameters

User-supplied subroutine to evaluate the function that defines theleast-squares problem. Call fcn (m, n, x, f), where

fcn

m - INTEGER. Input parameter. Length of fn - INTEGER. Input parameter. Length of x.x - DOUBLE PRECISION. Input parameter. Array of size n. Vector atwhich the function is evaluated; x should not be changed by fcn.f - DOUBLE PRECISION. Output parameter. Array of size m; containsthe function values at x.Declare fcn as EXTERNAL in the calling program.


INTEGER. Length of F.m

DOUBLE PRECISION. Array of size n. Vector at which the function isevaluated.

x

DOUBLE PRECISION. Precision of Jacobi matrix calculation.eps

Output Parameters


a

INTEGER. Informs about task completion.resres = TR_SUCCESS means the routine completed the task normally.res = TR_INVALID_OPTION means an error in the input parameters.res = TR_OUT_OF_MEMORY means a memory error.

2692


Examples of djacobi_solve Usage

Example 14-5. djacobi_solve Usage in FortranPROGRAM JACOBI_MATRIX

IMPLICIT NONE

INCLUDE '../include/mkl_opt_tr.f'




INTEGER N, M

PARAMETER (N = 4, M = 4)

C** JACOBI MATRIX



C** TEMPORARY ARRAYS F1 & F2 WHICH CONTAINS F1 = F(X+EPS) | F2 = F(X-EPS)

DOUBLE PRECISION A (M,N), X(N), F1(N), F2(N)

DOUBLE PRECISION EPS

C** PRECISIONS FOR JACOBI_MATRIX CALCULATION

PARAMETER (EPS = 1.D-6)

C** JACOBI-MATRIX SOLVER HANDLE

INTEGER*8 HANDLE


INTEGER SUCCESSFUL, RCI_REQUEST

C** SET THE X VALUES

X = 10.d0

2693



IF (DJACOBI_INIT (HANDLE, N, M, X, EPS) /= TR_SUCCESS) THEN


PRINT *, '#ERROR IN DJACOBI_INIT'

STOP

ENDIF


RCI_REQUEST = 0

SUCCESSFUL = 0

C** RCI CYCLE


C** CALL SOLVER

IF (DJACOBI_SOLVE (HANDLE, F1, F2, RCI_REQUEST) /= TR_SUCCESS) THEN


PRINT *, '#ERROR IN DJACOBI_SOLVE'

STOP

ENDIF

IF RCI_REQUEST == 1) THEN

C** CALCULATE FUNCTION VALUE F1 = F(X+EPS)

CALL EXTENDET_POWELL (M, N, X, F1)

ELSEIF (RCI_REQUEST == 2) THEN

C** CALCULATE FUNCTION VALUE F1 = F(X-EPS)

CALL EXTENDET_POWELL (M, N, X, F2)

ELSEIF (RCI_REQUEST == 0) THEN

C** EXIT RCI CYCLE

SUCCESSFUL = 1

ENDIF

ENDDO

2694



IF (DJACOBI_DELETE (HANDLE) /= TR_SUCCESS) THEN


PRINT *, '#ERROR IN DJACOBI_DELETE'

STOP

ENDIF

ENDPROGRAM JACOBI_MATRIX







IMPLICIT NONE

INTEGER M, N


INTEGER I

DO I = 1, N/4

F (4*I-3) = X(4*I - 3) + 10.D0 * X(4*I - 2)

F (4*I-2) = DSQRT(5.D0) * (X(4*I-1) - X(4*I))

F (4*I-1) = (X(4*I-2) - 2.D0*X(4*I-1))**2

F (4*I) = DSQRT(10.D0)*(X(4*I-3) - X(4*I))**2

ENDDO


2695


Example 14-6. djacobi_solve Usage in C

#include "../include/mkl_opt_tr.h"

#include <stdlib.h>

#include <stdio.h>

int main ()

{


extern void extendet_powell (int*, int*, double*, double*);



int n = 4, m = 4;

/* jacobi matrix */

solution vector. contains values x for f(x)

temporary arrays f1 & f2 that contain f1 = f(x+eps) | f2 = f(x-eps) */

double *a, *x, *f1, *f2;

/* precisions for jacobi_matrix calculation */

double eps = 0.000001;

/* jacobi-matrix solver handle */

_JACOBIMATRIX_HANDLE_t handle;


int successful, rci_request, i;

a = (double*) malloc (sizeof (double) * n*m);

x = (double*) malloc (sizeof (double) * n);

2696

f1 = (double*) malloc (sizeof (double) * n);

f2 = (double*) malloc (sizeof (double) * n);

/* set the x values */

for (i = 0; i < n; i++) x[i] = 10.0;

/* initialize solver (allocate mamory, set initial values) */

if (djacobi_init (&handle, &n, &m, x, a, &eps) != TR_SUCCESS){

/* if function does not complete successfully then print error message */

printf ("\n#ERROR IN DJACOBI_INIT\n");

return 0;

}


rci_request = 0;

successful = 0;

/* rci cycle */

while (successful == 0) {

/* call solver */

if (djacobi_solve (&handle, f1, f2, &rci_request) != TR_SUCCESS){


printf ("\n#ERROR IN DJACOBI_SOLVE\n");

return 0;

}

if (rci_request == 1)

/* calculate function value f1 = f(x+eps) */

extendet_powell (&m, &n, x, f1);

else if (rci_request == 2)

/* calculate function value f1 = f(x-eps) */

extendet_powell (&m, &n, x, f2);

else if (rci_request == 0)

2697


successful = 1;

}


2698


if (djacobi_delete (&handle) != TR_SUCCESS) {


printf ("\n#ERROR IN DJACOBI_DELETE\n");

return 0;

}

return 0;

}







{

int i;

for (i = 0; i < (*n)/4; i++)

{

f [4*i] = x [4*i] + 10.0*x [4*i + 1];

f [4*i + 1] = 2.2360679774*(x [4*i + 2] - x [4*i + 3]);

f [4*i + 2] = (x [4*i + 1] - 2.0*x [4*i + 2])*(x [4*i + 1] - 2.0*x[4*i + 2]);

f [4*i + 3] = 3.1622776601*(x [4*i] - x [4*i + 3])*(x [4*i] - x [4*i+ 3]);

}

return;

}

2699

Examples of djacobi Usage

Example 14-7. djacobi Usage in FortranC** COMPUTE JACOBI MATRIX










PRINT *, '| ERROR IN DJACOBI'

ENDIF

.......................................................................







IMPLICIT NONE

INTEGER M, N


INTEGER I

2700


DO I = 1, N/4

F (4*I-3) = X(4*I - 3) + 10.D0 * X(4*I - 2)

F (4*I-2) = DSQRT(5.D0) * (X(4*I-1) - X(4*I))

F (4*I-1) = (X(4*I-2) - 2.D0*X(4*I-1))**2

F (4*I) = DSQRT(10.D0)*(X(4*I-3) - X(4*I))**2

ENDDO


Example 14-8. djacobi Usage in C/* compute jacobi matrix





2701


if (djacobi (extendet_powell, &n, &m, fjac, x, &jac_eps) /= TR_SUCCESS){


printf ("\n#ERROR IN DJACOBI\n");

return 0;

}

/* .....................................................................*/







{

int i;

for (i = 0; i < (*n)/4; i++)

{

f [4*i] = x [4*i] + 10.0*x [4*i + 1];

f [4*i + 1] = 2.2360679774*(x [4*i + 2] - x [4*i + 3]);

f [4*i + 2] = (x [4*i + 1] - 2.0*x [4*i + 2])*(x [4*i + 1] - 2.0*x[4*i + 2]);

f [4*i + 3] = 3.1622776601*(x [4*i] - x [4*i + 3])*(x [4*i] - x [4*i+ 3]);

}

2702

return;

}

2703


15Support Functions

Intel® MKL support functions are used to:

– retrieve information about the current Intel MKL version

– handle errors

– test characters and character strings for equality

– measure user time for a process and elapsed CPU time

– set and measure CPU frequency

– free memory allocated by Intel MKL memory management software.

Functions described below are subdivided according to their purpose into the following groups:

Version Information Functions

Error Handling Functions

Equality Test Functions

Timing Functions

Memory Functions

Table 15-1 contains the list of support functions common for Intel MKL.

Table 15-1 Intel MKL Support Functions


Version Information Functions

Returns information about the active library version.MKLGetVersion

Returns information about the library version string.MKLGetVersionString


Handles error conditions for BLAS, LAPACK, VML routines.xerbla

Handles error conditions for ScaLAPACK routines.pxerbla


Tests two characters for equality regardless of case.lsame

2705


Tests two character strings for equality regardless of case.lsamen

Timing Functions

Returns user time for a process.second/dsecnd

Returns full precision elapsed CPU clocks.getcpuclocks

Returns CPU frequency value in GHz.getcpufrequency

Sets CPU frequency value in GHz.setcpufrequency

Memory Functions

Frees memory buffers.MKL_FreeBuffers

Version Information FunctionsIntel® MKL provides two methods for extracting information about the library version number.First, you may extract a version string using the MKLGetVersionString function . Or,alternatively, you can use the MKLGetVersion function to obtain an MKLVersion structurethat contains the version information. A makefile is also provided to automatically build theexamples and output summary files containing the version information for the current library.

MKLGetVersionReturns information about the active libraryversion.

Syntax

void MKLGetVersion(MKLVersion*pVersion)

Description

The MKLGetVersion function collects the information about the active version of Intel MKLsoftware and returns this information in a structure of MKLVersion type by the pVersionaddress . MKLVersion structure type is defined in mkl_types.h file. The following fields ofMKLVersion structure are available:

2706


is the major number of the current library version.MajorVersion

is the minor number of the current library version.MinorVersion

is the update number of the current library version.BuildNumber

is the status of the current library version. Possible variants could be“Beta”, “Product”.

ProductStatus

is the string that contains the build date and the internal build number.Build

is the processor optimization that is targeted for the specific processor.It is not the definition of the processor installed in the system, ratherthe MKL library detection that is optimal for the processor installed inthe system.

Processor

Output Parameters

Pointer to the MKLVersion structure.pVersion

2707

Support Functions 15

MKLGetVersion Usage

----------------------------------------------------------------------------------------------

#include <stdio.h>

#include <stdlib.h>

#include "mkl_blas.h"

#include "mkl_types.h"

int main(void)

{

MKLVersion pVersion;

MKLGetVersion(pVersion);

printf("Major version: %d\n",pVersion->MajorVersion);

printf("Minor version: %d\n",pVersion->MinorVersion);

printf("Update number: %d\n",pVersion->BuildNumber);

printf("Product status: %s\n",pVersion->ProductStatus);

printf("Build: %s\n",pVersion->Build);

printf("Processor optimization: %s\n",pVersion->Processor);

printf("================================================================\n");

printf("\n");

return 0;

}

Output:

9Major Version0Minor Version0Build numberProductProduct status061909.09Build

2708

Intel® Xeon® Processor with Intel® Extended Memory 64 TechnologyProcessoropimization

MKLGetVersionStringGets the library version string.

Syntax

Fortran:

call MKLGetVersionString( buf )

C:

MKLGetVersionString( buf, len );

Output Parameters

Source stringbuf

Length of the source stringlen

Description

The function MKLGetVersionString returns a string that contains the library versioninformation.

See example below:

Examples

Fortran:

program getversionstring

character*198 buf

call mklgetversionstring(buf)

write(*,'(a)') buf

end

2709


C:

#include <stdio.h>


int main(void)

{

int len=198;

char buf[198];

MKLGetVersionString(buf, len);

printf("%s\n",buf);

printf("\n");

return 0;

}


xerblaError handling routine called by BLAS, LAPACK,VML routines.

Syntax

call xerbla( srname, info )

Description

The routine xerbla is an error handler for the BLAS, LAPACK, and VML routines. It is called bya BLAS, LAPACK, or VML routine if an input parameter has an invalid value.

A message is printed and execution stops.

Installers may consider modifying the stop statement in order to call system-specificexception-handling facilities.

2710

Input Parameters

CHARACTER*6. The name of the routine which called xerbla.srname

INTEGER. The position of the invalid parameter in theparameter list of the calling routine.

info

pxerblaError handling routine called by ScaLAPACKroutines.

Syntax

call pxerbla(ictxt, srname, info)

Description

This routine is an error handler for the ScaLAPACK routines. It is called by a ScaLAPACK routineif an input parameter has an invalid value. A message is printed. Program execution is notterminated. For the ScaLAPACK driver and computational routines, a RETURN statement isissued following the call to pxerbla.

Control returns to the higher-level calling routine, and it is left to the user to determine howthe program should proceed. However, in the specialized low-level ScaLAPACK routines (auxiliaryroutines that are Level 2 equivalents of computational routines), the call to pxerbla() isimmediately followed by a call to BLACS_ABORT() to terminate program execution since recoveryfrom an error at this level in the computation is not possible.

It is always good practice to check for a nonzero value of info on return from a ScaLAPACKroutine. Installers may consider modifying this routine in order to call system-specificexception-handling facilities.

Input Parameters

(global) INTEGERictxtThe BLACS context handle, indicating the global context ofthe operation. The context itself is global.

(global) CHARACTER*6srnameThe name of the routine which called pxerbla.


2711


The position of the invalid parameter in the parameter listof the calling routine.


lsameTests two characters for equality regardless ofcase.

Syntax

val = lsame(ca, cb)

Description

This logical function returns .TRUE. if ca is the same letter as cb regardless of case.

Input Parameters

CHARACTER*1. Specify the single characters to be compared.ca, cb

Output Parameters


lsamenTests two character strings for equality regardlessof case.

Syntax

val = lsamen(n, ca, cb)

Description

This logical function tests if the first n letters of the string ca are the same as the first n lettersof cb, regardless of case. The function lsamen returns .TRUE. if ca and cb are equivalentexcept for case and .FALSE. otherwise. lsamen also returns .FALSE. if len(ca) or len(cb)is less than n.

2712


Input Parameters

INTEGER. The number of characters in ca and cb to becompared.

n

CHARACTER*(*). Specify two character strings of length atleast n to be compared. Only the first n characters of eachstring will be accessed.

ca, cb

Output Parameters


Timing Functions

second/dsecndReturns elapsed CPU time in seconds.

Syntax

val = second()

val = dsecnd()

Description

The functions second/dsecnd return the elapsed CPU time in seconds. These versions get thetime from the elapsed CPU clocks divided by CPU frequency. The difference is that dsecndreturns the result with double presision.

The functions should be applied in pairs: the first time, before a routine to be measured, andthe second time - after the measurement. The difference between the returned values is thetime spent in the routine. The usage of second is discouraged for measuring short time intervalsbecause the single precision format is not capable of holding sufficient timer precision.

Output Parameters

REAL for secondvalDOUBLE PRECISION for dsecndElapsed CPU time in seconds.

2713


getcpuclocksReturns full precision elapsed CPU clocks.

Syntax

getcpuclocks(clocks)

Description

The getcpuclocks subroutine returns the elapsed CPU clocks.

This may be useful when timing short intervals with a high resolution. The getcpuclockssubroutine is also applied in pairs like second/dsecnd. Note that out-of-order code executionon IA-32 or on Intel® 64 architecture processors may disturb the exact elapsed CPU clocksvalue a little bit, which may be important while measuring extremely short time intervals.

Output Parameters

INTEGER*8. Elapsed CPU clocks.clocks

getcpufrequencyReturns CPU frequency value in GHz.

Syntax

val = getcpufrequency()

Description

The function getcpufrequency returns the CPU frequency in GHz. This value is used by second/dsecnd functions while converting CPU clocks into seconds.

Obtaining a frequency may take some time for the first time second /dsecnd/getcpufrequencyis called. To avoid it, call setcpufrequency before setting the exact CPU frequency if it isknown in advance.

Output Parameters

DOUBLE PRECISION. CPU frequency value in GHz.val

2714


setcpufrequencySets CPU frequency value in GHz.

Syntax

setcpufrequency(freq)

Description

The setcpufrequency subroutine sets the CPU frequency in GHz, used then by second /dsecndfunctions while converting CPU clocks into seconds. Setting the exact CPU frequency is usefulto bypass obtaining a frequency by getcpufrequency.

Initially, CPU frequency value is unset. The CPU frequency value can be set only bysetcpufrequency call, or during the first call of getcpufrequency, if setcpufrequency hasnot been called before. Calling setcpufrequency with a special parameter freq = -1.0 forcesthe CPU frequency value be unset.

Output Parameters

DOUBLE PRECISION. CPU frequency value in GHz.freq

Memory FunctionsThis section describes the Intel MKL function that frees memory allocated by Intel® MKL MemoryManager. See the MKL User’s Guide for details of MKL memory management.

MKL_FreeBuffersFrees memory buffers.

Syntax

void MKL_FreeBuffers(void)

2715


Description

The function MKL_FreeBuffers frees the memory allocated by the MKL Memory Manager. TheMemory Manager allocates new buffers if no free buffers are currently available. CallMKL_FreeBuffers() to free all memory buffers and to avoid memory leaking on completionof work with Intel MKL functions, that is, after the last call of an MKL function from yourapplication.

See MKL User’s Guide for details.

2716


MKL_FreeBuffers Usage with DFT Functions

----------------------------------------------------------------------------------------------

DFTI_DESCRIPTOR *hand1, *hand2;

void MKL_FreeBuffers(void);

. . . . . .

/* Using MKL DFT */

Status = DftiCreateDescriptor(&hand1, DFTI_SINGLE, DFTI_COMPLEX, dim, m1);

Status = DftiCommitDescriptor(hand1);

Status = DftiComputeForward(hand1, s_array1);

. . . . . .

Status = DftiCreateDescriptor(&hand2, DFTI_SINGLE, DFTI_COMPLEX, dim, m2);

Status = DftiCommitDescriptor(hand2);

. . . . . .

Status = DftiFreeDescriptor(&hand1);

/* Do not call MKL_FreeBuffers() here as the hand2 descriptor will bedestroyed! */

. . . . . .

Status = DftiComputeBackward(hand2, s_array2));

Status = DftiFreeDescriptor(&hand2);

/* Here user finishes the MKL DFT usage */

/* Memory leak will be triggered by any memory control tool */

/* Use MKL_FreeBuffers() to avoid memory leaking */

MKL_FreeBuffers();

----------------------------------------------------------------------------------------------

If the memory space is sufficient, use MKL_FreeBuffers after the last call of the MKL functions.Otherwise, a drop in performance can occur due to reallocation of buffers for the subsequentMKL functions.

WARNING. For FFT calls, do not use MKL_FreeBuffers betweenDftiCreateDescriptor(hand) and DftiFreeDescriptor(&hand).

2717


16BLACS Routines

This chapter describes the Intel® Math Kernel Library implementation of routines from the BLACS (BasicLinear Algebra Communication Subprograms) package that are used to support a linear algebra orientedmessage passing interface that may be implemented efficiently and uniformly across a large range ofdistributed memory platforms.

The BLACS routines make linear algebra applications both easier to program and more portable. For thispurpose, they are used in Intel MKL as the communication layer of ScaLAPACK and Cluster DFT.

These routines perform distinct tasks that can be used for:

Initialization

Destruction

Information Purposes

Miscellaneous Tasks.

Initialization RoutinesThis section describes BLACS routines that deal with grid/context creation, and processing beforethe grid/context has been defined.

Table 16-1 BLACS Intialization Routines


Returns the number of processes available for use.blacs_pinfo

Allocates virtual machine and spawns processes.blacs_setup

Gets values that BLACS use for internal defaults.blacs_get

Sets values that BLACS use for internal defaults.blacs_set

Assigns available processes into BLACS process grid.blacs_gridinit

Maps available processes into BLACS process grid.blacs_gridmap

2719

blacs_pinfoReturns the number of processes available for use.

Syntax

call blacs_pinfo( mypnum, nprocs )

Output Parameters

INTEGER. An integer between 0 and (nprocs - 1) thatuniquely identifies each process.

mypnum

INTEGER.The number of processes available for BLACS use.nprocs

Description

This routine is used when some initial system information is required before the BLACS are setup. On all platforms except PVM, nprocs is the actual number of processes available for use,that is, nprows * npcols <= nprocs. In PVM, the virtual machine may not have been setup before this call, and therefore no parallel machine exists. In this case, nprocs is returnedas less than one. If a process has been spawned via the keyboard, it receives mypnum of 0, andall other processes get mypnum of -1. As a result, the user can distinguish between processes.Only after the virtual machine has been set up via a call to BLACS_SETUP, this routine returnsthe correct values for mypnum and nprocs.

blacs_setupAllocates virtual machine and spawns processes.

Syntax

call blacs_setup( mypnum, nprocs )

Input Parameters

INTEGER. On the process spawned from the keyboard ratherthan from pvmspawn, this parameter indicates the numberof processes to create when building the virtual machine.

nprocs

2720

Output Parameters

INTEGER. An integer between 0 and (nprocs - 1) thatuniquely identifies each process.

mypnum

INTEGER. For all processes other than spawned from thekeyboard, this parameter means the number of processesavailable for BLACS use.

nprocs

Description

This routine only accomplishes meaningful work in the PVM BLACS. On all other platforms, itis functionally equivalent to blacs_pinfo. The BLACS assume a static system, that is, thegiven number of processes does not change. PVM supplies a dynamic system, allowing processesto be added to the system on the fly.

blacs_setup is used to allocate the virtual machine and spawn off processes. It reads in a filecalled blacs_setup.dat, in which the first line must be the name of your executable. The secondline is optional, but if it exists, it should be a PVM spawn flag. Legal values at this time are 0(PvmTaskDefault), 4 (PvmTaskDebug), 8 (PvmTaskTrace), and 12 (PvmTaskDebug +PvmTaskTrace). The primary reason for this line is to allow the user to easily turn on and offPVM debugging. Additional lines, if any, specify what machines should be added to the currentconfiguration before spawning nprocs-1 processes to the machines in a round robin fashion.

nprocs is input on the process which has no PVM parent (that is, mypnum=0), and bothparameters are output for all processes. So, on PVM systems, the call to blacs_pinfo informsyou that the virtual machine has not been set up, and a call to blacs_setup then sets up themachine and returns the real values for mypnum and nprocs.

Note that if the file blacs_setup.dat does not exist, the BLACS prompt the user for the executablename, and processes are spawned to the current PVM configuration.

blacs_getGets values that BLACS use for internal defaults.

Syntax

call blacs_get( icontxt, what, val )

2721

BLACS Routines 16

Input Parameters

INTEGER. On values of what that are tied to a particularcontext, this parameter is the integer handle indicating thecontext. Otherwise, ignored.

icontxt

INTEGER. Indicates what BLACS internal(s) should bereturned in val. Present options are:

what

• what = 0 : Handle indicating default system context

• what = 1 : The BLACS message ID range

• what = 2 : The BLACS debug level the library wascompiled with

• what = 10 : Handle indicating the system context usedto define the BLACS context whose handle is icontxt

• what = 11 : Number of rings multiring topology ispresently using

• what = 12 : Number of branches general tree topologyis presently using.

Output Parameters

INTEGER. The value of the BLACS internal.val

Description

This routine gets the values that the BLACS are using for internal defaults. Some values aretied to a BLACS context, and some are more general. The most common use is in retrieving adefault system context for input into blacs_gridinit or blacs_gridmap.

Some systems, such as MPI*, supply their own version of context. For those users who mixsystem code with BLACS code, a BLACS context should be formed in reference to a systemcontext. Thus, the grid creation routines take a system context as input. If you wish to havestrictly portable code, you may use blacs_get to retrieve a default system context that willinclude all available processes. This value is not tied to a BLACS context, so the parametericontxt is unused.

blacs_get returns information on three quantities that are tied to an individual BLACS context,which is passed in as icontxt. The information that may be retrieved is:

• The handle of the system context upon which this BLACS context was defined

2722


• The number of rings for TOP = 'M' (multiring broadcast)

• The number of branches for TOP = 'T' (general tree broadcast/general tree gather).

blacs_setSets values that BLACS use for internal defaults.

Syntax

call blacs_set( icontxt, what, val )

Input Parameters

INTEGER. On values of what that are tied to a particularcontext, this parameter is the integer handle indicating thecontext. Otherwise, ignored.

icontxt

INTEGER. Indicates what BLACS internal(s) should be set.Present values are:

what

• 1 = The BLACS message ID range

• 11 = Number of rings for multiring topology to use

• 12 = Number of branches for general tree topology touse.

INTEGER. Array of dimension (*).Indicates the value(s) theinternals should be set to. The specific meanings dependon what values, as discussed in Description below.

val

Description

This routine sets the BLACS internal defaults depending on the what values:

Setting the BLACS message ID range.what = 1If you wish to mix the BLACS with other message-passing packages,restrict the BLACS to a certain message ID range not to used by thenon-BLACS routines. The message ID range must be set before thefirst call to blacs_gridinit or blacs_gridmap. Subsequent calls willhave no effect. Because the message ID range is not tied to a particularcontext, the parameter icontxt is ignored, and val is defined as:VAL (input) INTEGER array of dimension (2)

2723

BLACS Routines 16

VAL(1) : The smallest message ID (also called message type ormessage tag) the BLACS should use.

VAL(2) : The largest message ID (also called message type ormessage tag) the BLACS should use.

Set number of rings for TOP = 'M' (multiring broadcast).This quantityis tied to a context, so icontxt is used, and val is defined as:

what = 11

VAL (input) INTEGER array of dimension (1)VAL(1) : The number of rings for multiring topology to use.

Set number of rings for TOP = 'T' (general tree broadcast/generaltree gather). This quantity is tied to a context, so icontxt is used,and val is defined as:

what = 12

VAL (input) INTEGER array of dimension (1)VAL(1) : The number branches for general tree topology to use.

blacs_gridinitAssigns available processes into BLACS processgrid.

Syntax

call blacs_gridinit( icontxt, order, nprow, npcol )

Input Parameters

INTEGER. Integer handle indicating the system context tobe used in creating the BLACS context. Call blacs_get toobtain a default system context.

icontxt

CHARACTER*1. Indicates how to map processes to BLACSgrid. Options are:

order

• 'R' : Use row-major natural ordering

• 'C' : Use column-major natural ordering

• ELSE : Use row-major natural ordering

INTEGER. Indicates how many process rows the processgrid should contain.

nprow

INTEGER. Indicates how many process columns the processgrid should contain.

npcol

2724


Output Parameters

INTEGER. Integer handle to the created BLACS context.icontxt

Description

All BLACS codes must call this routine, or its sister routine blacs_gridmap. These routinestake the available processes, and assign, or map, them into a BLACS process grid. In otherwords, they establish how the BLACS coordinate system maps into the native machine processnumbering system. Each BLACS grid is contained in a context, so that it does not interfere withdistributed operations that occur within other grids/contexts. These grid creation routines maybe called repeatedly to define additional contexts/grids.

The creation of a grid requires input from all processes that are defined to be in this grid.Processes belonging to more than one grid have to agree on which grid formation will be servicedfirst, much like the globally blocking sum or broadcast.

These grid creation routines set up various internals for the BLACS, and one of them must becalled before any calls are made to the non-initialization BLACS.

Note that these routines map already existing processes to a grid: the processes are not createddynamically. On most parallel machines, the processes are actual processors (hardware), andthey are "created" when you run your executable. When using the PVM BLACS, if the virtualmachine has not been set up yet, the routine blacs_setup should be used to create the virtualmachine.

This routine creates a simple nprow x npcol process grid. This process grid uses the firstnprow * npcol processes, and assigns them to the grid in a row- or column-major naturalordering. If these process-to-grid mappings are unacceptable, call blacs_gridmap.

blacs_gridmapMaps available processes into BLACS process grid.

Syntax

call blacs_gridmap( icontxt, usermap, ldumap, nprow, npcol )

Input Parameters

INTEGER. Integer handle indicating the system context tobe used in creating the BLACS context. Call blacs_get toobtain a default system context.

icontxt

2725

BLACS Routines 16

INTEGER. Array, dimension (ldumap, npcol), indicating theprocess-to-grid mapping.

usermap

INTEGER. Leading dimension of the 2D array usermap.

ldumap ≥ nprow.

ldumap

INTEGER. Indicates how many process rows the processgrid should contain.

nprow

INTEGER. Indicates how many process columns the processgrid should contain.

npcol

Output Parameters

INTEGER. Integer handle to the created BLACS context.icontxt

Description

All BLACS codes must call this routine, or its sister routine blacs_gridinit. These routinestake the available processes, and assign, or map, them into a BLACS process grid. In otherwords, they establish how the BLACS coordinate system maps into the native machine processnumbering system. Each BLACS grid is contained in a context, so that it does not interfere withdistributed operations that occur within other grids/contexts. These grid creation routines maybe called repeatedly to define additional contexts/grids.

The creation of a grid requires input from all processes that are defined to be in this grid.Processes belonging to more than one grid have to agree on which grid formation will be servicedfirst, much like the globally blocking sum or broadcast.

These grid creation routines set up various internals for the BLACS, and one of them must becalled before any calls are made to the non-initialization BLACS.

Note that these routines map already existing processes to a grid: the processes are not createddynamically. On most parallel machines, the processes are actual processors (hardware), andthey are "created" when you run your executable. When using the PVM BLACS, if the virtualmachine has not been set up yet, the routine blacs_setup should be used to create the virtualmachine.

This routine allows the user to map processes to the process grid in an arbitrary manner.usermap(i,j) holds the process number of the process to be placed in {i, j} of the processgrid. On most distributed systems, this process number is a machine defined number between0 ... nprow-1. For PVM, these node numbers are the PVM TIDS (Task IDs). The blacs_gridmaproutine is intended for an experienced user. The blacs_gridinit routine is much simpler.blacs_gridinit simply performs a gridmap where the first nprow * npcol processes aremapped into the current grid in a row-major natural ordering. If you are an an experienced

2726


user, blacs_gridmap allows you to take advantage of your system's actual layout. That is,you can map nodes that are physically connected to be neighbors in the BLACS grid, etc. Theblacs_gridmap routine also opens the way for multigridding: you can separate your nodesinto arbitrary grids, join them together at some later date, and then re-split them into newgrids. blacs_gridmap also provides the ability to make arbitrary grids or subgrids (for example,a "nearest neighbor" grid), which can greatly facilitate operations among processes that do notfall on a row or column of the main process grid.

Destruction RoutinesThis section describes BLACS routines that destroy grids, abort processes, and free resources.

Table 16-2 BLACS Destruction Routines


Frees BLACS buffer.blacs_freebuff

Frees a BLACS context.blacs_gridexit

Aborts all processes.blacs_abort

Frees all BLACS contexts and releases all allocated memory.blacs_exit

blacs_freebuffFrees BLACS buffer.

Syntax

call blacs_freebuff( icontxt, wait )

Input Parameters

INTEGER. Integer handle that indicates the BLACS context.icontxt

INTEGER. Parameter indicating whether to wait fornon-blocking operations or not. If equals 0, the operationsshould not be waited for; free only unused buffers.Otherwise, wait in order to free all buffers.

wait

2727

BLACS Routines 16

Description

This routine releases the BLACS buffer.

The BLACS have at least one internal buffer that is used for packing messages. The number ofinternal buffers depends on what platform you are running the BLACS on. On systems wherememory is tight, keeping this buffer or buffers may become expensive. Call freebuff to releasethe buffer. However, the next call of a communication routine that requires packing reallocatesthe buffer.

The wait parameter determines whether the BLACS should wait for any non-blocking operationsto be completed or not. If wait = 0, the BLACS free any buffers that can be freed withoutwaiting. If wait is not 0, the BLACS free all internal buffers, even if non-blocking operationsmust be completed first.

blacs_gridexitFrees a BLACS context.

Syntax

call blacs_gridexit( icontxt )

Input Parameters

INTEGER. Integer handle that indicates the BLACS contextto be freed.

icontxt

Description

This routine frees a BLACS context.

Release the resources when contexts are no longer needed. After freeing a context, the contextno longer exists, and its handle may be re-used if new contexts are defined.

blacs_abortAborts all processes.

Syntax

call blacs_abort( icontxt, errornum )

2728


Input Parameters

INTEGER. Integer handle that indicates the BLACS contextto be aborted.

icontxt

INTEGER. User-defined integer error number.errornum

Description

This routine aborts all the BLACS processes, not only those confined to a particular context.

Use blacs_abort to abort all the processes in case of a serious error. Note that both parametersare input, but the routine uses them only in printing out the error message. The context handlepassed in is not required to be a valid context handle.

blacs_exitFrees all BLACS contexts and releases all allocatedmemory.

Syntax

call blacs_exit( continue )

Input Parameters

INTEGER. Flag indicating whether message passing continuesafter the BLACS are done. If continue is non-zero, the useris assumed to continue using the machine after completingthe BLACS. Otherwise, no message passing is assumed aftercalling this routine.

continue

Description

This routine frees all BLACS contexts and releases all allocated memory.

This routine should be called when a process has finished all use of the BLACS. The continueparameter indicates whether the user will be using the underlying communication platformafter the BLACS are finished. This information is most important for the PVM BLACS. If continueis set to 0, then pvm_exit is called; otherwise, it is not called. Setting continue not equal to0 indicates that explicit PVM send/recvs will be called after the BLACS are done. Make sureyour code calls pvm_exit. PVM users should either call blacs_exit or explicitly call pvm_exitto avoid PVM problems.

2729

BLACS Routines 16

Informational RoutinesThis section describes BLACS routines that return information involving the process grid.

Table 16-3 BLACS Informational Routines


Returns information on the current grid.blacs_gridinfo

Returns the system process number of the process in theprocess grid.

blacs_pnum

Returns the row and column coordinates in the process grid.blacs_pcoord

blacs_gridinfoReturns information on the current grid.

Syntax

call blacs_gridinfo( icontxt, nprow, npcol, myprow, mypcol )

Input Parameters

INTEGER. Integer handle that indicates the context.icontxt

Output Parameters

INTEGER. Number of process rows in the current processgrid.

nprow

INTEGER. Number of process columns in the current processgrid.

npcol

INTEGER. Row coordinate of the calling process in theprocess grid.

myprow

INTEGER. Column coordinate of the calling process in theprocess grid.

mypcol

2730


Description

This routine returns information on the current grid. If the context handle does not point at avalid context, all quantities are returned as -1.

blacs_pnumReturns the system process number of the processin the process grid.

Syntax

call blacs_pnum( icontxt, prow, pcol )

Input Parameters


INTEGER. Row coordinate of the process the system processnumber of which is to be determined.

prow

INTEGER. Column coordinate of the process the systemprocess number of which is to be determined.

pcol

Description

This function returns the system process number of the process at {PROW, PCOL} in the processgrid.

blacs_pcoordReturns the row and column coordinates in theprocess grid.

Syntax

call blacs_pcoord( icontxt, pnum, prow, pcol )

Input Parameters


INTEGER. Process number the coordinates of which are tobe determined. This parameter stand for the process numberof the underlying machine, that is, it is a tid for PVM.

pnum

2731

BLACS Routines 16

Output Parameters

INTEGER. Row coordinates of the pnum process in the BLACSgrid.

prow

INTEGER. Column coordinates of the pnum process in theBLACS grid.

pcol

Description

Given the system process number, this function returns the row and column coordinates in theBLACS process grid.

Miscellaneous RoutinesThis section describes blacs_barrier routine.

Table 16-4 BLACS Informational Routines


Holds up execution of all processes within the indicated scopeuntil they have all called the routine.

blacs_barrier

blacs_barrierHolds up execution of all processes within theindicated scope.

Syntax

call blacs_barrier( icontxt, scope )

Input Parameters


CHARACTER*1. Parameter that indicates whether a processrow (scope='R'), column ('C'), or entire grid ('A') willparticipate in the barrier.

scope

2732


Description

This routine holds up execution of all processes within the indicated scope until they have allcalled the routine.

2733

BLACS Routines 16

Examples of BLACS Routines Usage

Example 16-1. BLACS Usage. Hello World

The following routine takes the available processes, forms them into a process grid, and thenhas each process check in with the process at {0,0} in the process grid.

PROGRAM HELLO

* -- BLACS example code --

* Written by Clint Whaley 7/26/94

* Performs a simple check-in type hello world

* ..

* .. External Functions ..

INTEGER BLACS_PNUM

EXTERNAL BLACS_PNUM

* ..

* .. Variable Declaration ..

INTEGER CONTXT, IAM, NPROCS, NPROW, NPCOL, MYPROW, MYPCOL

INTEGER ICALLER, I, J, HISROW, HISCOL

*

* Determine my process number and the number of processes in

* machine

*

CALL BLACS_PINFO(IAM, NPROCS)

*

* If in PVM, create virtual machine if it doesn't exist

*

IF (NPROCS .LT. 1) THEN

IF (IAM .EQ. 0) THEN

WRITE(*, 1000)

2734


READ(*, 2000) NPROCS

END IF

CALL BLACS_SETUP(IAM, NPROCS)

END IF

*

* Set up process grid that is as close to square as possible

*

NPROW = INT( SQRT( REAL(NPROCS) ) )

NPCOL = NPROCS / NPROW

*

* Get default system context, and define grid

*

CALL BLACS_GET(0, 0, CONTXT)

CALL BLACS_GRIDINIT(CONTXT, 'Row', NPROW, NPCOL)

CALL BLACS_GRIDINFO(CONTXT, NPROW, NPCOL, MYPROW, MYPCOL)

*

* If I'm not in grid, go to end of program

*

IF ( (MYPROW.GE.NPROW) .OR. (MYPCOL.GE.NPCOL) ) GOTO 30

*

* Get my process ID from my grid coordinates

*

ICALLER = BLACS_PNUM(CONTXT, MYPROW, MYPCOL)

*

* If I am process {0,0}, receive check-in messages from

* all nodes

*

IF ( (MYPROW.EQ.0) .AND. (MYPCOL.EQ.0) ) THEN

2735

BLACS Routines 16

WRITE(*,*) ' '

DO 20 I = 0, NPROW-1

DO 10 J = 0, NPCOL-1

IF ( (I.NE.0) .OR. (J.NE.0) ) THEN

CALL IGERV2D(CONTXT, 1, 1, ICALLER, 1, I, J)

END IF

*

* Make sure ICALLER is where we think in process grid

*

CALL BLACS_PCOORD(CONTXT, ICALLER, HISROW, HISCOL)

IF ( (HISROW.NE.I) .OR. (HISCOL.NE.J) ) THEN

WRITE(*,*) 'Grid error! Halting . . .'

STOP

END IF

WRITE(*, 3000) I, J, ICALLER

10 CONTINUE

20 CONTINUE

WRITE(*,*) ' '

WRITE(*,*) 'All processes checked in. Run finished.'

*

* All processes but {0,0} send process ID as a check-in

*

ELSE

CALL IGESD2D(CONTXT, 1, 1, ICALLER, 1, 0, 0)

2736


END IF

30 CONTINUE

CALL BLACS_EXIT(0)

1000 FORMAT('How many processes in machine?')

2000 FORMAT(I)

3000 FORMAT('Process {',i2,',',i2,'} (node number =',I,

$ ') has checked in.')

STOP

END

2737

BLACS Routines 16

Example 16-2. BLACS Usage. PROCMAP

This routine maps processes to a grid using blacs_gridmap.

SUBROUTINE PROCMAP(CONTEXT, MAPPING, BEGPROC, NPROW, NPCOL, IMAP)

*

* -- BLACS example code --

* Written by Clint Whaley 7/26/94

* ..

* .. Scalar Arguments ..

INTEGER CONTEXT, MAPPING, BEGPROC, NPROW, NPCOL

* ..

* .. Array Arguments ..

INTEGER IMAP(NPROW, *)

* ..

*

* Purpose

* =======

* PROCMAP maps NPROW*NPCOL processes starting from process BEGPROC to

* the grid in a variety of ways depending on the parameter MAPPING.

*

* Arguments

* =========

*

* CONTEXT (output) INTEGER

* This integer is used by the BLACS to indicate a context.

* A context is a universe where messages exist and do not

* interact with other context's messages. The context

* includes the definition of a grid, and each process's

2738


* coordinates in it.

*

* MAPPING (intput) INTEGER

* Way to map processes to grid. Choices are:

* 1 : row-major natural ordering

* 2 : column-major natural ordering

*

* BEGPROC (input) INTEGER

* The process number (between 0 and NPROCS-1) to use as

* {0,0}. From this process, processes will be assigned

* to the grid as indicated by MAPPING.

*

* NPROW (input) INTEGER

* The number of process rows the created grid

* should have.

*

* NPCOL (input) INTEGER

* The number of process columns the created grid

* should have.

*

* IMAP (workspace) INTEGER array of dimension (NPROW, NPCOL)

* Workspace, where the array which maps the

* processes to the grid will be stored for the

* call to GRIDMAP.

2739

BLACS Routines 16

*

* ===============================================================

*

* ..

* .. External Functions ..

INTEGER BLACS_PNUM

EXTERNAL BLACS_PNUM

* ..

* .. External Subroutines ..

EXTERNAL BLACS_PINFO, BLACS_GRIDINIT, BLACS_GRIDMAP

* ..

* .. Local Scalars ..

INTEGER TMPCONTXT, NPROCS, I, J, K

* ..

* .. Executable Statements ..

*

* See how many processes there are in the system

*

CALL BLACS_PINFO( I, NPROCS )

IF (NPROCS-BEGPROC .LT. NPROW*NPCOL) THEN

WRITE(*,*) 'Not enough processes for grid'

STOP

END IF

*

* Temporarily map all processes into 1 x NPROCS grid

*

CALL BLACS_GET( 0, 0, TMPCONTXT )

CALL BLACS_GRIDINIT( TMPCONTXT, 'Row', 1, NPROCS )

2740


K = BEGPROC

*

* If we want a row-major natural ordering

*

IF (MAPPING .EQ. 1) THEN

DO I = 1, NPROW

DO J = 1, NPCOL

IMAP(I, J) = BLACS_PNUM(TMPCONTXT, 0, K)

K = K + 1W

END DO

END DO

*

* If we want a column-major natural ordering

*

ELSE IF (MAPPING .EQ. 2) THEN

DO J = 1, NPCOL

DO I = 1, NPROW

IMAP(I, J) = BLACS_PNUM(TMPCONTXT, 0, K)

K = K + 1

END DO

END DO

ELSE

WRITE(*,*) 'Uknown mapping.'

STOP

END IF

*

* Free temporary context

2741

BLACS Routines 16

*

CALL BLACS_GRIDEXIT(TMPCONTXT)

*

* Apply the new mapping to form desired context

*

CALL BLACS_GET( 0, 0, CONTEXT )

CALL BLACS_GRIDMAP( CONTEXT, IMAP, NPROW, NPROW, NPCOL )

RETURN

END

2742


ALinear Solvers Basics

Many applications in science and engineering require the solution of a system of linear equations. Thisproblem is usually expressed mathematically by the matrix-vector equation, Ax = b, where A is an m-by-nmatrix, x is the n element column vector and b is the m element column vector. The matrix A is usuallyreferred to as the coefficient matrix, and the vectors x and b are referred to as the solution vector andthe right-hand side, respectively.

Basic concepts related to solving linear systems with sparse matrices are described in section SparseLinear Systems that follows.

If the coefficients in matrix A and right-hand sides in vector b are not defined exactly but rather belongto known intervals, the system is called an interval linear system. Some basic definitions andconcepts used in solving interval linear systems are described in Interval Linear Systems section below.

Sparse Linear SystemsIn many real-life applications, most of the elements in A are zero. Such a matrix is referred to assparse. Conversely, matrices with very few zero elements are called dense. For sparse matrices,computing the solution to the equation Ax = b can be made much more efficient with respect toboth storage and computation time, if the sparsity of the matrix can be exploited. The more analgorithm can exploit the sparsity without sacrificing the correctness, the better the algorithm.

Generally speaking, computer software that finds solutions to systems of linear equations is calleda solver. A solver designed to work specifically on sparse systems of equations is called a sparsesolver. Solvers are usually classified into two groups - direct and iterative.

Iterative Solvers start with an initial approximation to a solution and attempt to estimate thedifference between the approximation and the true result. Based on the difference, an iterativesolver calculates a new approximation that is closer to the true result than the initial approximation.This process is repeated until the difference between the approximation and the true result issufficiently small. The main drawback to iterative solvers is that the rate of convergence dependsgreatly on the values in the matrix A. Consequently, it is not possible to predict how long it will takefor an iterative solver to produce a solution. In fact, for ill-conditioned matrices, the iterative processwill not converge to a solution at all. However, for well-conditioned matrices it is possible for iterativesolvers to converge to a solution very quickly. Consequently for the right applications, iterativesolvers can be very efficient.

Direct Solvers, on the other hand, often factor the matrix A into the product of two triangularmatrices and then perform a forward and backward triangular solve.

2743

This approach makes the time required to solve a systems of linear equations relativelypredictable, based on the size of the matrix. In fact, for sparse matrices, the solution time canbe predicted based on the number of non-zero elements in the array A.

Matrix Fundamentals

A matrix is a rectangular array of either real or complex numbers. A matrix is denoted by acapital letter; its elements are denoted by the same lower case letter with row/column subscripts.Thus, the value of the element in row i and column j in matrix A is denoted by a(i,j). Forexample, a 3 by 4 matrix A, is written as follows:

Note that with the above notation, we assume the standard Fortran programming languageconvention of starting array indices at 1 rather than the C programming language conventionof starting them at 0.

A matrix in which all of the elements are real numbers is called a real matrix. A matrix thatcontains at least one complex number is called a complex matrix. A real or complex matrix Awith the property that a(i,j) = a(j,i), is called a symmetric matrix. A complex matrix A withthe property that a(i,j) = conj(a(j,i)), is called a Hermitian matrix. Note that programs thatmanipulate symmetric and Hermitian matrices need only store half of the matrix values, sincethe values of the non-stored elements can be quickly reconstructed from the stored values.

A matrix that has the same number of rows as it has columns is referred to as a square matrix.The elements in a square matrix that have same row index and column index are called thediagonal elements of the matrix, or simply the diagonal of the matrix.

The transpose of a matrix A is the matrix obtained by “flipping” the elements of the array aboutits diagonal. That is, we exchange the elements a(i,j) and a(j,i). For a complex matrix, ifwe both flip the elements about the diagonal and then take the complex conjugate of theelement, the resulting matrix is called the Hermitian transpose or conjugate transpose of theoriginal matrix. The transpose and Hermitian transpose of a matrix A are denoted by AT andAH respectively.

2744

A Intel® Math Kernel Library Reference Manual

A column vector, or simply a vector, is a n × 1 matrix, and a row vector is a 1 × n matrix. Areal or complex matrix A is said to be positive definite if the vector-matrix product xTAx isgreater than zero for all non-zero vectors x. A matrix that is not positive definite is referred toas indefinite.

An upper (or lower) triangular matrix, is a square matrix in which all elements below (or above)the diagonal are zero. A unit triangular matrix is an upper or lower triangular matrix with all1's along the diagonal.

A matrix P is called a permutation matrix if, for any matrix A, the result of the matrix productPA is identical to A except for interchanging the rows of A. For a square matrix, it can be shownthat if PA is a permutation of the rows of A, then APT is the same permutation of the columnsof A. Additionally, it can be shown that the inverse of P is PT.

In order to save space, a permutation matrix is usually stored as a linear array, called apermutation vector, rather than as an array. Specifically, if the permutation matrix maps thei-th row of a matrix to the j-th row, then the i-th element of the permutation vector is j.

A matrix with non-zero elements only on the diagonal is called a diagonal matrix. As is the casewith a permutation matrix, it is usually stored as a vector of values, rather than as a matrix.

Direct Method

For solvers that use the direct method, the basic technique employed in finding the solution ofthe system Ax = b is to first factor A into triangular matrices. That is, find a lower triangularmatrix L and an upper triangular matrix U, such that A = LU. Having obtained such afactorization (usually referred to as an LU decomposition or LU factorization), the solution tothe original problem can be rewritten as follows.

Ax = b

LUx = b

(Ux) = b

This leads to the following two-step process for finding the solution to the original system ofequations:

1. Solve the systems of equations Ly = b.

2. Solve the system Ux = y.

Solving the systems Ly = b and Ux = y is referred to as a forward solve and a backward solve,respectively.

2745

Linear Solvers Basics A

If a symmetric matrix A is also positive definite, it can be shown that A can be factored as LLT

where L is a lower triangular matrix. Similarly, a Hermitian matrix, A, that is positive definitecan be factored as A = LLH. For both symmetric and Hermitian matrices, a factorization of thisform is called a Cholesky factorization.

In a Cholesky factorization, the matrix U in an LU decomposition is either LT or LH. Consequently,a solver can increase its efficiency by only storing L, and one-half of A, and not computing U.Therefore, users who can express their application as the solution of a system of positive definiteequations will gain a significant performance improvement over using a general representation.

For matrices that are symmetric (or Hermitian) but not positive definite, there are still somesignificant efficiencies to be had. It can be shown that if A is symmetric but not positive definite,then A can be factored as A = LDLT, where D is a diagonal matrix and L is a lower unit triangularmatrix. Similarly, if A is Hermitian, it can be factored as A = LDLH. In either case, we againonly need to store L, D, and half of A and we need not compute U. However, the backward solvephases must be amended to solving LTx = D-1y rather than LTx = y.

Fill-In and Reordering of Sparse Matrices

Two important concepts associated with the solution of sparse systems of equations are fill-inand reordering. The following example illustrates these concepts.

Consider the system of linear equation Ax = b, where A is the symmetric positive definitesparse matrix defined by the following:

2746


A star (*) is used to represent zeros and to emphasize the sparsity of A. The Choleskyfactorization of A is: A = LLT, where L is the following:

Notice that even though the matrix A is relatively sparse, the lower triangular matrix L has nozeros below the diagonal. If we computed L and then used it for the forward and backwardsolve phase, we would do as much computation as if A had been dense.

The situation of L having non-zeros in places where A has zeros is referred to as fill-in.Computationally, it would be more efficient if a solver could exploit the non-zero structure ofA in such a way as to reduce the fill-in when computing L. By doing this, the solver would onlyneed to compute the non-zero entries in L. Toward this end, consider permuting the rows andcolumns of A. As described in Matrix Fundamentals section , the permutations of the rows of Acan be represented as a permutation matrix, P. The result of permuting the rows is the productof P and A. Suppose, in the above example, we swap the first and fifth row of A, then swap thefirst and fifth columns of A, and call the resulting matrix B. Mathematically, we can express theprocess of permuting the rows and columns of A to get B as B = PAPT. After permuting therows and columns of A, we see that B is given by the following:

2747


Since B is obtained from A by simply switching rows and columns, the numbers of non-zeroentries in A and B are the same. However, when we find the Cholesky factorization, B = LLT,we see the following:

2748


The fill-in associated with B is much smaller than the fill-in associated with A. Consequently,the storage and computation time needed to factor B is much smaller than to factor A. Basedon this, we see that an efficient sparse solver needs to find permutation P of the matrix A,which minimizes the fill-in for factoring B = PAPT, and then use the factorization of B to solvethe original system of equations.

Although the above example is based on a symmetric positive definite matrix and a Choleskydecomposition, the same approach works for a general LU decomposition. Specifically, let P bea permutation matrix, B = PAPT and suppose that B can be factored as B = LU. Then

Ax = b

PA(P-1P)x = Pb

PA(PTP)x = Pb

(PAPT)(Px) = Pb

B(Px) = Pb

LU(Px) = Pb

It follows that if we obtain an LU factorization for B, we can solve the original system of equationsby a three step process:

1. Solve Ly = Pb.

2. Solve Uz = y.

3. Set x = PTz.

If we apply this three-step process to the current example, we first need to perform the forwardsolve of the systems of equation Ly = Pb:

2749


This gives:

The second step is to perform the backward solve, Uz = y. Or, in this case, since we are usinga Cholesky factorization, LTz = y.

2750


This gives

The third and final step is to set x = PTz. This gives

Sparse Matrix Storage Formats

As discussed above, it is more efficient to store only the non-zeros of a sparse matrix. Thisassumes that the sparsity is large, that is, the number of non-zero entries is a small percentageof the total number of entries. If there is only an occasional zero entry, the cost of exploitingthe sparsity actually slows down the computation when compared to simply treating the matrixas dense, meaning that all the values, zero and non-zero, are used in the computation.

2751


There are a number of common storage schemes used for sparse matrices, but most of theschemes employ the same basic technique. That is, compress all of the non-zero elements ofthe matrix into a linear array, and then provide some number of auxiliary arrays to describethe locations of the non-zeros in the original matrix.

Storage Formats for the PARDISO Solver

The compression of the non-zeros of a sparse matrix A into a linear array is done by walkingdown each column (column major format) or across each row (row major format) in order, andwriting the non-zero elements to a linear array in the order that they appear in the walk.

When storing symmetric matrices, it is necessary to store only the upper triangular half of thematrix (upper triangular format) or the lower triangular half of the matrix (lower triangularformat).

The Intel MKL direct sparse solver uses a row major upper triangular storage format. That is,the matrix is compressed row-by-row and for symmetric matrices only non-zeros in the uppertriangular half of the matrix are stored.

The Intel MKL storage format accepted for the PARDISO software for sparse matrices consistsof three arrays, which are called the values, columns, and rowIndex arrays. The followingtable describes the arrays in terms of the values, row, and column positions of the non-zeroelements in a sparse matrix A.

A real or complex array that contains the non-zero entries of A. Thenon-zero values of A are mapped into the values array using the rowmajor, upper triangular storage mapping described above.

values

Element i of the integer array columns contains the number of thecolumn in A that contained the value in values(i).

columns

Element j of the integer array rowIndex gives the index into the valuesarray that contains the first non-zero element in a row j of A.

rowIndex

The length of the values and columns arrays is equal to the number of non-zeros in A.

Since the rowIndex array gives the location of the first non-zero within a row, and the non-zerosare stored consecutively, then the number of non-zeros in the i-th row is equal to the differenceof rowIndex(i) and rowIndex(i+1).

In order to have this relationship hold for the last row of A, an additional entry (dummy entry)is added to the end of rowIndex whose value is equal to the number of non-zeros in A, plusone. This makes the total length of the rowIndex array one larger than the number of rows ofA.

2752


NOTE. The Intel MKL sparse storage scheme uses the Fortran programming languageconvention of starting array indices at 1, rather than the C programming languageconvention of starting at 0.

With the above in mind, consider storing the symmetric matrix discussed in the example fromthe previous section.

In this case, A has nine non-zero elements, so the lengths of the values and columns arrayswill be nine. Also, since the matrix A has five rows, the rowIndex array is of length six. Theactual values for each of the arrays for the example matrix are as follows:

Table A-1 Storage Arrays for a Symmetric Example Matrix

one base indexing

16)5/81/21/233/463/2(9=values

5)4325432(1=columns

10)9876(1=rowIndex

zero base indexing

16)5/81/21/233/463/2(9=values

4)3214321(0=columns

9)8765(0=rowIndex

2753


For a non-symmetric or non-Hermitian array, all of the non-zeros need to be stored. Considerthe non-symmetric matrix B defined by the following:

We see that B has 13 non-zeros, and we store B as follows:

Table A-2 Storage Arrays for a Non-Symmetric Example Matrix

one base indexing

-5)872-44645-2-3-1(1=values

5)24315432142(1=columns

14)12964(1=rowIndex

zero baseindexing

-5)872-44645-2-3-1(1=values

4)13204321031(0=columns

13)11853(0=rowIndex

In the current version of Intel MKL, direct sparse solvers cannot solve non-symmetric systemsof equations. However, it can solve symmetrically structured systems of equations. Asymmetrically structured system of equations is one where the pattern of non-zeros is symmetric.That is, a matrix has a symmetric structure if a(j,i)is non-zero if and only if a(j, i) is non-zero.From the point of view of the solver software, a non-zero element of a matrix is anything thatis stored in the values array. In that sense, we can turn any non-symmetric matrix into asymmetrically structured matrix by carefully adding zeros to the values array. For example,suppose we consider the matrix B to have the following set of non-zero entries:

2754


Now B can be considered to be symmetrically structured with 15 non-zero entries.We wouldrepresent the matrix as:

Table A-3 Storage Arrays for a Symmetrically Structured Example Matrix

-5)0872-446405-2-3-1(1=values

5)3243154352142(1=columns

16)131074(1=rowIndex

Storage Format Restrictions

The storage format for the sparse solver must conform to two important restrictions:

First, the non-zero values in a given row must be placed into the values array in the order inwhich they occur in the row (from left to right). Second, no diagonal element can be omittedfrom the values array for any symmetric or structurally symmetric matrix.

The second restriction implies that when dealing with symmetric or structurally symmetricmatrices that have zeros on the diagonal, the zero diagonal elements must be explicitlyrepresented in the values array.

Sparse Storage Formats for Sparse BLAS Levels 2-3

This section describes in detail the sparse data structures supported in the current version ofthe Intel MKL Sparse BLAS level 2 and 3.

2755


CSR Format

The Intel MKL compressed sparse row (CSR) format for sparse matrices consists of four arrays,which are called the values, columns, pointerB, and pointerE arrays. The following tabledescribes the arrays in terms of the values, row, and column positions of the non-zero elementsin a sparse matrix A.

A real or complex array that contains the non-zero entries of A. Thenon-zero values of A are mapped into the values array using the rowmajor storage mapping described above.

values


columns

Element j of this integer array gives the index into the values arraythat contains the first non-zero element in a row j of A. Note that thisindex is equal to pointerB(j) - pointerB(1)+1 .

pointerB

An integer array contains row indices, such thatpointerE(j)-pointerB(1) is the index into the values array thatcontains the last non-zero element in a row j of A.

pointerE

The length of the values and columns arrays is equal to the number of non-zeros in A.Thelength of the pointerB and pointerE arrays is equal to the number of rows in A.

Previously defined matrix B

can be represented in the CSR format as:

Table A-4 Storage Arrays for an Example Matrix in CSR Format

one base indexing

-5)872-44645-2-3-1(1=values

2756


5)24315432142(1=columns

12)964(1=pointerB

14)1296(4=pointerE

zero base indexing

-5)872-44645-2-3-1(1=values

4)13204321031(0=columns

11)853(0=pointerB

13)1185(3=pointerE

This storage format is used in the NIST Sparse BLAS library [Rem05].

Note that the storage format accepted for the PARDISO software and described above (seeStorage Formats for the PARDISO Solver), is a variation of the CSR format. The PARDISOformat has a restriction - all non-zero elements are stored continuously, that is the set ofnon-zero elements in the row J goes just after the set of non-zero elements in the row J-1.

There is no such restrictions in the CSR format. This advantage can be useful, for example, ifthere is a need to operate with different submatrices of the matrix at the same time. In thiscase, it is enough to define the arrays pointerB and pointerE for each needed submatrix sothat all these array are pointers to the one array values.

Comparing the array rowIndex from the Table A-2 with the arrays pointerB and pointerEfrom the Table A-4 it is easy to see that

pointerB(i) = rowIndex(i) for i=1, ..5;

pointerE(i) = rowIndex(i+1) for i=1, ..5.

This gives the possibility to call a routine that has values, columns, pointerB and pointerEas input parameters for a sparse matrix stored in the format accepted for PARDISO. For example,a routine with the interface:

Subroutine name_routine(.... , values, columns, pointerB, pointerE, ...)

can be called with arguments values, columns, rowIndex in the following way:

call name_routine(.... , values, columns, rowIndex, rowindex(2), ...).

NOTE. Intel MKL Sparse BLAS level 2 provide routines for both flavors of the CSR format.

2757


CSC Format

The compressed sparse column format (CSC), often called Harwell-Boeing sparse matrixformat, is similar to the CSR format, but the columns are used instead the rows. Or, in otherwords, CSC format is equal to the CSR format for the transposed matrix.

By analogy with the CSR format Intel MKL Sparse BLAS level 2 library provides routines fortwo variations of the CSC format.

Variation of this format accepted for the PARDISO software consists of three arrays, which arecalled the values, rows, and colIndex arrays. The following table describes these arrays:

A real or complex array that contains the non-zero entries of A. Thenon-zero values of A are mapped into the values array using the columnmajor storage mapping described above.

values

Element i of the integer array rows contains the number of the row inA that contained the value in values(i).

rows

Element j of the integer array colIndex gives the index into the valuesarray that contains the first non-zero element in a column j of A.

colIndex

The length of the values and rows arrays is equal to the number of non-zero elements in A.

For example, the sparse matrix B

can be represented in the CSC format for PARDISO as follows:

Table A-5 Storage Arrays for an Example Matrix in the Harwell-Boeing format

-5)476-32485-1-4-2(1=values

5)24315432142(1=rows

14)12974(1=colIndex

2758


Coordinate Format

The coordinate format is the most flexible and simplest format for the sparse matrixrepresentation. Only nonzero entries are provided, and the coordinates of each nonzero entryis given explicitly. Many commercial libraries support the matrix-vector multiplication for thesparse matrices in the coordinate format.

The Intel MKL coordinate format consists of three arrays, which are called the values, rows,and column arrays, and a parameter nnz which is number of non-zero entries in A. All threearrays have to be dimensioned as nnz. The following table describes the arrays in terms of thevalues, row, and column positions of the non-zero elements in a sparse matrix A.

A real or complex array that contains the non-zero entries of A givenin any order.

values

Element i of the integer array rows contains the number of the row inA that contained the value in values(i).

rows


columns

For example, the sparse matrix C

can be represented in the coordinate format as follows:

Table A-6 Storage Arrays for an Example Matrix in case of the coordinate format

-5)872-44645-2-3-1(1=values

5)54443332211(1=rows

5)24315432132(1=columns

2759


Diagonal Storage Scheme

If the matrix A has a few diagonals, then this structure can be used to reduce the amount ofinformation needed for the location of the non-zero elements. This storage scheme is particularlyuseful in many applications where the matrix arises from a finite element or finite differencediscretization. The Intel MKL diagonal storage scheme consists of two arrays, which are calledthe values and distance arrays, and parameters ndiag which is the number of non-emptydiagonals, and lval which is declared leading dimension in the calling (sub) program. Thefollowing table describes the arrays values and distance:

A real or complex two dimensional array is dimensioned as lval byndiag. It contains the non-zero diagonals of A. The key point of thestorage is that each element in values retains the row corresponding

values

to the row in the original matrix. In order to do so diagonals in thelower triangular part of A are padded from the top, and those in theupper triangular part are padded from the bottom. Note that the valueof distance(i) is the number of elements to be padded for diagonali.

An integer array is dimensioned as ndiag. Element i of the array integerdistance contains the distance between i-diagonal and the maindiagonal. The distance is positive if the diagonal is above the maindiagonal, and negative if the diagonal is below the main diagonal. Themain diagonal has a distance equal to zero.

distance

The sparse matrix C given above can be stored in the diagonal storage scheme as follows:

where the asterisks denote padded elements.

2760


It is clear that the upper triangle or lower triangle can be stored if the matrix is symmetric,hermitian, or skew-symmetric.

The diagonals can be stored in any order if the sparse diagonal representation is used for IntelMKL sparse matrix-matrix or matrix-vector multiplication routines. However, all elements ofthe array distance must be sorted in increasing order if the sparse diagonal representation isused for Intel MKL sparse triangular solver routines.

Skyline Storage Scheme

The skyline storage scheme is important in the direct sparse solvers, and it is well suited forCholesky or LU decomposition when no pivoting is required.

The skyline storage scheme accepted in the Intel MKL can store only triangular matrix ortriangular part of the matrix. This variant consists of two arrays which are called values andpointers arrays. The following table describes these arrays:

A scalar array. It contains the set of elements from each row of Astarting from the first non-zero elements to and uncluding the diagonalelement if the matrix is lower triangular, and the set of elements from

values

each column of A starting with the first non-zero element down to andincluding the diagonal element. Encountered zero elements are includedin the sets.

An integer array is dimensioned as m+1, where m is the number of rowsfor lower triangle (columns for the upper triangle). pointers(i) -pointers(1)+1 points to the location in values of the first non-zero

pointers

element in row (column) i. The value of pointers(m+1) is set to thevalue nnz+pointers(1), where nnz is the number of elements in thearray values.

Note that Intel MKL Sparse BLAS does not support general matrices for the routines operatingwith the skyline storage format.

For example, the low triangle of the matrix C given above can be stored as follows:

values = (1 -2 5 4 -4 0 2 7 8 0 0 -5 )

pointers = ( 1 2 4 5 9 13 )

and the upper triangle of this matrix C can be stored as follows:

values = ( 1 -1 5 -3 0 4 6 7 4 0 -5 )

pointers = ( 1 2 4 7 9 12 )

2761


This storage format is supported by the NIST Sparse BLAS library [Rem05].

BSR Format

The Intel MKL block compressed sparse row (BSR) format for sparse matrices consists of fourarrays, which are called the values, columns, pointerB, and pointerE arrays. The followingtable describes these arrays.

A real array that contains the non-zero blocks of a sparse matrix storingthem block by block in row-wise fashion. A non-zero block is the blockthat contains at least one non-zero element. All elements of non-zero

values

blocks are stored, even if some of them is equal to zero. Elements ofnonzero blocks are stored in column major order within each denseblock in the case of the Fortran programming language convention,and in row major order in the case of the C programming languageconvention.

Element i of the integer array columns contains the number of thecolumn in the block matrix that contains the i-th non-zero block.

columns

Element j of this integer array gives the index into the columns arraythat contains the first non-zero block in a row j of the block matrix.

pointerB

Element j of this integer array gives the index into the columns arraythat contains the last non-zero block in a row j of the block matrix plus1.

pointerE

The length of the values array is equal to the number of all elements in the non-zero blocks,the length of the columns array is equal to the number of non-zero blocks. The length of thepointerB and pointerE arrays is equal to the number of rows in the block matrix.

For example, consider the sparse matrix D

2762


If the size of the block equals to 2, then the sparse matrix D can be represented as a 3x3 blockmatrix E with the following structure:

where

The matrix D can be represented in the BSR format as follows:

one-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 4 0 3 0 7 0 2 0 )

columns = ( 1 2 2 2 3 )

pointerB = ( 1 3 4 )

pointerE = ( 3 4 6 )

zero-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 4 0 3 0 7 0 2 0 )

columns = ( 0 1 1 1 2 )

pointerB = ( 0 2 3 )

pointerE = ( 2 3 5 )

This storage format is supported by the NIST Sparse BLAS library [Rem05].

The Intel MKL supports the PARDISO variation of the block compressed sparse row (BSR) formatfor sparse matrices. This format consists of three arrays, which are called the values, columns,and rowIndex arrays. The following table describes these arrays.

2763


A real array that contains the non-zero blocks of a sparse matrix storingthem block by block in row-wise fashion. A non-zero block is the blockthat contains at least one non-zero element. All elements of non-zero

values

blocks are stored, even if some of them is equal to zero. Elements ofnonzero blocks are stored in column major order within each block inthe case of the Fortran programming language convention, and in rowmajor order in the case of the C programming language convention.

Element i of the integer array columns contains the number of thecolumn in the block matrix that contains the i-th non-zero block.

columns

Element j of this integer array gives the index in the columns arraythat contains the first non-zero block in a row j of the block matrix.

rowIndex

The length of the values array is equal to the number of all elements in the non-zero blocks,the length of the columns array is equal to the number of non-zero blocks.

Since the rowIndex array gives the location of the first non-zero block within a row, and thenon-zero blocks are stored consecutively, then the number of non-zero blocks in the i-th rowis equal to the difference of rowIndex(i) and rowIndex(i+1).

In order to have this relationship hold for the last row of the block matrix, an additional entry(dummy entry) is added to the end of rowIndex whose value is equal to the number of non-zerosblocks plus one. This makes the total length of the rowIndex array one larger than the numberof rows of the block matrux.

The above matrix D can be represented in the BSR format (PARDISO) variation as follows:

one-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 4 0 3 0 7 0 2 0 )

columns = ( 1 2 2 2 3 )

rowIndex = ( 1 3 4 6 )

zero-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 4 0 3 0 7 0 2 0 )

columns = ( 0 1 1 1 2 )

rowIndex = ( 0 2 3 5 )

When storing symmetric matrices, it is necessary to store only the upper triangular half of thematrix (upper triangular format) or the lower triangular half of the matrix (lower triangularformat).

For example, consider the symmetric sparse matrix F:

2764


If the size of the block equals to 2, then the sparse matrix F can be represented as a 3x3 blockmatrix G with the following structure:

where

The symmetric matrix F can be represented in the PARDISO variation of the BSR format (storingonly upper triangular) as follows:

one-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 7 0 2 0 )

columns = ( 1 2 2 3 )

rowIndex = ( 1 3 4 5 )

2765


zero-based indexing

values = (1 2 0 1 6 8 7 2 1 5 4 2 4 0 3 0 7 0 2 0 )

columns = ( 0 1 1 2 )

rowIndex = ( 0 2 3 4 )

Interval Linear Systems

Intervals

An interval is a compact connected subset of the real axis R. It is thus completely defined bytwo numbers, namely, its lower endpoint and upper endpoint (sometimes called leftendpoint and right endpoint respectively), so that [a, b] denotes the interval

{x ∈ R|a ≤ x≤b}. The set of all real intervals is denoted by IR. In mathematical notation,taking the lower and upper endpoints of an interval is usually denoted byinf[a, b] = a, sup[a, b] = b.

In the discussion below, intervals and interval objects are denoted by boldface letters, whileunderscores and overscores designate the lower and upper endpoints of the interval x = [x,x]

Every interval is uniquely determined by its midpoint,

and radius,

2766


the latter being equivalent to the width wid a = a--a. Intervals of the form [a, a] that haveequal lower and upper endpoints, that is, intervals of zero width, are called degenerate or

point or thin, and they coincide with usual real numbers so that it can be implied R ⊂ IR.On the contrary, the intervals with nonzero width are called thick intervals.

Since intervals are sets, set-theoretical relations and operations between them are applicable,

for example, inclusion, intersection, and so on. In particular, a point t ∈ R is a member of the

interval a (written as t ∈ a) if a ≤ t ≤ a. Also, the inclusion is defined as a ⊆ b if and only

if a ≥ b and a ≤ b

Intervals and interval objects (vectors, matrices, etc.) are a convenient tool to represent theso-called bounded uncertainty and ambiguity, when only the lower and upper bounds of thepossible variation of some value are known. In this sense, intervals provide an alternative toprobabilistic and fuzzy approaches for describing quantitative uncertainty.

Arithmetic operations, such as addition, subtraction, multiplication and division, can be extendedto intervals according to the fundamental principle

(1)a*b: = {a*b|a ∈ a,b∈b}, *∈ {+, -, ·,/},

which makes it possible to define the so-called classical interval arithmetic. Note that

the empty interval [∅] is often incorporated into the computer interval arithmetic structures.

Interval vectors and matrices

An interval vector is an ordered tuple of intervals placed vertically(column vector) or horizontally(row vector). So, if a1, a2, ... , an are intervals, then

and

a1, a2, ... , an is a row vector.

2767


The set of all interval n-vectors is denoted later in the text by IRn.

The interval vectors can be associated with their geometric images, namely rectangular boxesof the space Rn, whose sides are parallel to the coordinate axes. For this reason, interval vectorsare often called boxes for brevity.

An interval matrix is a rectangular table composed of the intervals:

or A = (aij). Interval vectors can be identified with interval matrices either of the size n × 1(column vectors) or 1 × n (row vectors). The set of all interval m × n-matrices is denoted byIRm×n. Arithmetic operations between interval vectors and matrices can be introduced on thebasis of the relation that generalizes (1) (see [Alefeld83], [Neumaier90]).

An interval square matrix A ∈ IRn×n is referred to as regular (nonsingular) if and only if all

the point matrices A ∈ A are regular (nonsingular), that is, have nonzero determinants.

Otherwise, the interval matrix A ∈ IRn×n is called singular, which means that it contains atleast one singular point matrix.

Generally, recognition of whether an interval matrix is regular or singular is an NP-hard problem,which implies that there may be no relatively simple (polynomially complex) algorithms thatcompletely solve the problem in a reasonable time.

2768


For practical needs, it is important to have a set of workable sufficient criteria for testingregularity of a wide range of interval matrices. Intel MKL provides routines that implementRis-Beeck spectral criterion, Rump singular value criterion, as well as Rohn-Rex singular valuecriterion for testing regularity/singularity of interval matrices.

Sometimes, a related property (called strong regularity) needs to be checked for intervalmatrices. Strong regularity requires that the product of the interval matrix by its midpointinverse is regular. The routine ?gerbr enables to check the strong regularity judging by thevalue of its output parameter sr.

Interval Linear Systems

Solving systems of linear algebraic equations of the form

(2)

or, concisely,

Ax = b

2769


with an m × n matrix A and a right-hand side m-vector b, is one of the key problems in scienceand engineering. If aij and bi are not defined exactly but rather belong to known intervals aijand bi respectively, the system is called an interval linear system and can be written as

(3)

with intervals aij and bi, or in a short form as

(4)Ax = b

with an interval matrix A = ( aij) and interval right-hand side vector b = ( bi). An intervallinear system (3)–(4) is considered as a set of point linear systems of the same form Ax = b

with the parameters aij and bi such that aij ∈ aij and bi ∈ bi.

When aij and bi are changing within intervals aij and bi, the solutions to the correspondingpoint systems Ax = b with A = ( aij) and b = (bi) form a set in the space Rn, namely

(5)Ξ(A, b := {x ∈ Rn|(∃A ∈ A)(∃b ∈ b)(Ax = b)}.

The set (5), made up of solutions to all the point systems Ax = b with A ∈ A and b ∈ b, iscalled a solution set to the interval linear system (3)–(4). Usually, the solution set is a solid

polyhedron in Rn for independent aij and bi, 1≤i, j≤n, sometimes star-shaped as in the figurebelow.

2770


NOTE. The above described set Ξ(A,B) is often called a untied solution set, sincethere exist a variety of other solution sets to interval systems of equations (see[Shary02]).

An exact description of the solution set is practically impossible for dimensions n larger thanseveral tens, since its complexity grows exponentially with n. On the other hand, such an exactdescription is not really necessary in most cases. Usually, one needs to compute someestimates, in a prescribed sense, of the solution set. The most popular in practice is thefollowing problem of outer (by supersets) interval estimation:

(6)For an interval system of linear equations Ax = b

find an interval enclosure of the solution set Ξ(A, b).

Frequently, a component-wise form of the problem (6) is considered:

(7)For an interval system of linear equations Ax = b

find estimates for min{xv|x∈Ξ(A, b)} from below,

for max{xv|x∈Ξ(A, b)}from above, v = 1, 2, . . . , n.

In particular, Intel MKL ?gepps routines operate with this type of the problem statement.

The problem (6)–(7) is one of the historically first and most popular in modern interval analysis.You can find an extensive bibliography on this problem, for example, in [Alefeld83], [Kearfott96], [Neumaier90]).

Thus, solving an interval linear system is understood here as computing an outer intervalestimate of the solution set to an interval linear system (3)–(4). The matrix A of the system isusually assumed to be square nonsingular.

2771


Unlike classical computational linear algebra, solving interval linear systems proves to be verycomputationally hard in general. Computing the optimal (smallest) interval enclosures of thesolution in (6), or, equivalently, computing exact estimates of the solution set in (7), is anNP-hard problem (see [Kreinovich97]), if there are no restrictions on the widths of the intervalsin the system and/or the structure of nonzero elements in the matrix A. Moreover, the problemremains NP-hard even if we weaken the requirements on the solution and compute estimatesof the solution sets that must be precise to within a predetermined absolute or relative accuracy.

From the practical standpoint, NP-hardness means that with a high probability a general problemcannot be solved in polynomial time with respect to problem size.

For this reason, numerical algorithms employed in Intel MKL for solving interval linear systemsare divided into two classes depending on whether or not they provide a guaranteed accuracyof the result. “Fast” algorithms work fast and compute an enclosure of the solution set in areasonable time, but without any accuracy assumptions. “Optimal”, or “sharp” algorithms maytake a lot of time to complete execution, but the results they obtain are less crude and maysatisfy some accuracy requirements.

Intel MKL includes interval solver routines that implement algorithms of both types. For example,fast methods, such as interval Gauss method, interval Householder method, Hansen-Bliek-Rohnmethod, and Krawczyk iteration, are implemented in routines ?gegas, ?gehss, ?gehbs, and?gekws, respectively. Parameter partitioning method (PPS-method) implemented in ?geppsroutine is an example of a sharp method. The routine ?trtrs is subsumed under both categoriesdue to a very special matrix structure.

Preconditioning

Preconditioning of interval linear system (4) is multiplying both the matrix A and theright-hand side vector b by a point matrix, with the intension to improve the properties of thesystem. So the system Ax = b is substituted by the following system

(CA)x = Cb

where C is some square point matrix. Preconditioning is widely used in classical computationallinear algebra, and many interval solver algorithms (for example, interval Gauss method,interval Gauss-Seidel method and some others) also require a suitable preconditioning prior totheir use.

One of the widely used preconditioning methods for the interval linear systems is preconditioningdone by the inverse of the midpoint matrix, often called midpoint-inverse preconditioning.In Intel MKL, the midpoint inverse preconditioning is implemented in the routine ?gemip.

2772


Inverting interval matrices

Given an interval square matrix A, an enclosure for the set of all inverse point matrices in A iscalled the inverse interval matrixA-1, that is,

A-1⊇{A-1|A ∈ A}.

In classical linear algebra, the solution to a system of linear algebraic equations Ax = b withsquare nonsingular matrix A can be expressed as the product of the inverse A-1 by the right-handside vector, or x = A-1b.

In interval analysis, the similar product A-1b also produces an enclosure for the solution set

ΞA,b of the interval linear system Ax = b. However, this method usually causes substantialoverestimation and is not recommended. Using specialized procedures for outer estimation ofthe solution sets is preferable.

Nevertheless, computing tight enclosures for inverse interval matrices is essential insensitivity-like analysis of equation systems and the like.

Computing the inverse interval matrix may be carried out as finding an enclosure for the solutionset of the following interval matrix equation

AY = I, where I is the identity matrix,

by applying n times (for every column of the matrix Y ) any method to solve the interval linearsystems.

Note also that direct iterative procedures for finding the inverse interval matrix exist, such asSchulz method (see [Herzberger94]), which is included into Intel MKL as ?geszi routine.

2773


BRoutine and Function Arguments

The major arguments in the BLAS routines are vector and matrix, whereas VML functions work on vectorarguments only. The sections that follow discuss each of these arguments and provide examples.

Vector Arguments in BLASVector arguments are passed in one-dimensional arrays. The array dimension (length) and vectorincrement are passed as integer variables. The length determines the number of elements in thevector. The increment (also called ) determines the spacing between vector elements and the orderof the elements in the array in which the vector is passed.

A vector of length n and increment incx is passed in a one-dimensional array x whose values aredefined asx(1), x(1+|incx|), ..., x(1+(n-1)* |incx|)

If incx is positive, then the elements in array x are stored in increasing order. If incx is negative,the elements in array x are stored in decreasing order with the first element defined as x(1+(n-1)*|incx|). If incx is zero, then all elements of the vector have the same value, x(1). The dimensionof the one-dimensional array that stores the vector must always be at leastidimx = 1 + (n-1)* |incx |

Example B-1 One-dimensional Real Array

Let x(1:7) be the one-dimensional real array

x = (1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0).

If incx =2 and n = 3, then the vector argument with elements in order from first to last is (1.0,5.0, 9.0).

If incx = -2 and n = 4, then the vector elements in order from first to last is (13.0, 9.0, 5.0,1.0).

If incx = 0 and n = 4, then the vector elements in order from first to last is (1.0, 1.0, 1.0,1.0).

One-dimensional substructures of a matrix, such as the rows, columns, and diagonals, can be passedas vector arguments with the starting address and increment specified. In Fortran, storing the m-by-nmatrix is based on column-major ordering where the increment between elements in the samecolumn is 1, the increment between elements in the same row is m, and the increment betweenelements on the same diagonal is m + 1.

2775

Example B-2 Two-dimensional Real Matrix

Let a be the real 5 x 4 matrix declared as REAL A (5,4).

To scale the third column of a by 2.0, use the BLAS routine sscal with the following callingsequence:

callsscal (5, 2.0, a(1,3), 1)

To scale the second row, use the statement:

callsscal (4, 2.0, a(2,1), 5)

To scale the main diagonal of A by 2.0, use the statement:

callsscal (5, 2.0, a(1,1), 6)

NOTE. The default vector argument is assumed to be 1.

Vector Arguments in VMLVector arguments of VML mathematical functions are passed in one-dimensional arrays withunit vector increment. It means that a vector of length n is passed contiguously in an array awhose values are defined as a[0], a[1], ..., a[n-1] (for C- interface).

To accommodate for arrays with other increments, or more complicated indexing, VML containsauxiliary pack/unpack functions that gather the array elements into a contiguous vector andthen scatter them after the computation is complete.

Generally, if the vector elements are stored in a one-dimensional array a as

a[m0], a[m1], ..., a[mn-1]

and need to be regrouped into an array y as

y[k0], y[k1], ..., y[kn-1],

VML pack/unpack functions can use one of the following indexing methods:

Positive Increment Indexingkj = incy * j, mj = inca * j, j = 0 ,..., n-1

Constraint: incy > 0 and inca > 0.

For example, setting incy = 1 specifies gathering array elements into a contiguous vector.

2776

B Intel® Math Kernel Library Reference Manual

This method is similar to that used in BLAS, with the exception that negative and zero incrementsare not permitted.

Index Vector Indexingkj = iy[j], mj = ia[j], j = 0 ,..., n-1,

where ia and iy are arrays of length n that contain index vectors for the input and output arraysa and y, respectively.

Mask Vector Indexing

Indices kj, mj are such that:

my[kj] ≠ 0, ma[mj] ≠ 0 , j = 0,..., n-1,

where ma and my are arrays that contain mask vectors for the input and output arrays a andy, respectively.

Matrix ArgumentsMatrix arguments of the Intel® Math Kernel Library routines can be stored in either one- ortwo-dimensional arrays, using the following storage schemes:

• conventional full storage (in a two-dimensional array)

• packed storage for Hermitian, symmetric, or triangular matrices (in a one-dimensional array)

• band storage for band matrices (in a two-dimensional array).

Full storage is the following obvious scheme: a matrix A is stored in a two-dimensional arraya, with the matrix element aij stored in the array element a(i,j).

If a matrix is triangular (upper or lower, as specified by the argument uplo), only the elementsof the relevant triangle are stored; the remaining elements of the array need not be set.

Routines that handle symmetric or Hermitian matrices allow for either the upper or lower triangleof the matrix to be stored in the corresponding elements of the array:

aij is stored in a(i,j) for i ≤ j, other elements of a need not be set.if uplo ='U',

aij is stored in a(i,j) for j ≤ i, other elements of a need not be set.if uplo ='L',

2777

Routine and Function Arguments B

Packed storage allows you to store symmetric, Hermitian, or triangular matrices morecompactly: the relevant triangle (again, as specified by the argument uplo) is packed bycolumns in a one-dimensional array ap:

if uplo ='U', aij is stored in ap(i+j(j-1)/2) for i ≤ j

if uplo ='L', aij is stored in ap(i+(2*n-j)*(j-1)/2) for j ≤ i.

In descriptions of LAPACK routines, arrays with packed matrices have names ending in p.

Band storage is as follows: an m-by-n band matrix with kl non-zero sub-diagonals and kunon-zero super-diagonals is stored compactly in a two-dimensional array ab with kl+ku+1rows and n columns. Columns of the matrix are stored in the corresponding columns of thearray, and diagonals of the matrix are stored in rows of the array. Thus,

aij is stored in ab(ku+1+i-j,j) for max(1,j-ku) ≤ i ≤ min(n,j+kl).

Use the band storage scheme only when kl and ku are much less than the matrix size n.Although the routines work correctly for all values of kl and ku, using the band storage isinefficient if your matrices are not really banded.

The band storage scheme is illustrated by the following example, whenm = n = 6, kl = 2, ku = 1

Array elements marked * are not used by the routines:

2778


When a general band matrix is supplied for LU factorization, space must be allowed tostore kl additional super-diagonals generated by fill-in as a result of row interchanges. Thismeans that the matrix is stored according to the above scheme, but with kl + kusuper-diagonals. Thus,

aij is stored in ab(kl+ku+1+i-j,j) for max(1,j-ku) ≤ i ≤ min(n,j+kl).

The band storage scheme for LU factorization is illustrated by the following example, whenm= n = 6, kl = 2, ku = 1:

Array elements marked * are not used by the routines; elements marked + need not be seton entry, but are required by the LU factorization routines to store the results. The input arraywill be overwritten on exit by the details of the LU factorization as follows:

2779


where uij are the elements of the upper triangular matrix U, and mij are the multipliers usedduring factorization.

Triangular band matrices are stored in the same format, with either kl= 0 if upper triangular,or ku = 0 if lower triangular. For symmetric or Hermitian band matrices with k sub-diagonalsor super-diagonals, you need to store only the upper or lower triangle, as specified by theargument uplo:

if uplo ='U', aij is stored in ab(k+1+i-j,j) for max(1,j-k) ≤ i ≤ j

if uplo ='L', aij is stored in ab(1+i-j,j) for j ≤ i ≤ min(n,j+k).

In descriptions of LAPACK routines, arrays that hold matrices in band storage have namesending in b.

In Fortran, column-major ordering of storage is assumed. This means that elements of thesame column occupy successive storage locations.

Three quantities are usually associated with a two-dimensional array argument: its leadingdimension, which specifies the number of storage locations between elements in the same row,its number of rows, and its number of columns. For a matrix in full storage, the leading dimensionof the array must be at least as large as the number of rows in the matrix.

A character transposition parameter is often passed to indicate whether the matrix argumentis to be used in normal or transposed form or, for a complex matrix, if the conjugate transposeof the matrix is to be used.

The values of the transposition parameter for these three cases are the following:

normal (no conjugation, no transposition)'N' or 'n'transpose'T' or 't'conjugate transpose.'C' or 'c'

Example B-3 Two-Dimensional Complex Array

Suppose A (1:5, 1:4) is the complex two-dimensional array presented by matrix

2780


Let transa be the transposition parameter, m be the number of rows, n be the number ofcolumns, and lda be the leading dimension. Then if

transa = 'N', m = 4, n = 2, and lda = 5, the matrix argument would be

If transa = 'T', m = 4, n = 2, and lda =5, the matrix argument would be

If transa = 'C', m = 4, n = 2, and lda =5, the matrix argument would be

2781


Note that care should be taken when using a leading dimension value which is different fromthe number of rows specified in the declaration of the two-dimensional array. For example,suppose the array A above is declared as COMPLEX A (5,4).

Then if transa = 'N', m = 3, n = 4, and lda = 4, the matrix argument will be

2782


CCode Examples

This appendix presents code examples of using some Intel MKL routines and functions. You can find hereexample code written in both Fortran and C.

Currently, the appendix includes the following sections:

• BLAS Code Examples

• PARDISO Code Examples

• Direct Sparse Solver Code Examples

• Iterative Sparse Solver Code Examples

• Fourier Transform Functions Code Examples

• Interval Linear Solvers Code Examples

• PDE Support Code Examples.

Please refer to respective chapters in the manual for detailed descriptions of function parameters andoperation.

BLAS Code Examples

Example C-1. Using BLAS Level 1 Function

The following example illustrates a call to the BLAS Level 1 function sdot. This function performs avector-vector operation of computing a scalar product of two single-precision real vectors x and y.

Parameters

Specifies the order of vectors x and y.n

Specifies the increment for the elements of x.incx

Specifies the increment for the elements of y.incy

program dot_main

2783

real x(10), y(10), sdot, res

integer n, incx, incy, i

external sdot

n = 5

incx = 2

incy = 1

do i = 1, 10

x(i) = 2.0e0

y(i) = 1.0e0

end do

res = sdot (n, x, incx, y, incy)

print*, `SDOT = `, res

end

As a result of this program execution, the following line is printed:

SDOT = 10.000

Example C-2. Using BLAS Level 1 Routine

The following example illustrates a call to the BLAS Level 1 routine scopy. This routine performsa vector-vector operation of copying a single-precision real vector x to a vector y.

Parameters

Specifies the order of vectors x and y.n

Specifies the increment for the elements of x.incx

2784

C Intel® Math Kernel Library Reference Manual

Parameters

Specifies the increment for the elements of y.incy

program copy_main

real x(10), y(10)

integer n, incx, incy, i

n = 3

incx = 3

incy = 1

do i = 1, 10

x(i) = i

end do

call scopy (n, x, incx, y, incy)

print*, `Y = `, (y(i), i = 1, n)

end

As a result of this program execution, the following line is printed:

Y = 1.00000 4.00000 7.00000


The following example illustrates a call to the BLAS Level 2 routine sger. This routine performsa matrix-vector operation

a := alpha*x*y' + a.

Parameters

Specifies a scalar alpha.alpha

m-element vector.x

n-element vector.y

m-by-n matrix.a

program ger_main

2785

Code Examples C

real a(5,3), x(10), y(10), alpha

integer m, n, incx, incy, i, j, lda

m = 2

n = 3

lda = 5

incx = 2

incy = 1

alpha = 0.5

do i = 1, 10

x(i) = 1.0

y(i) = 1.0

end do

do i = 1, m

do j = 1, n

a(i,j) = j

end do

end do

call sger (m, n, alpha, x, incx, y, incy, a, lda)

print*, `Matrix A: `

do i = 1, m

print*, (a(i,j), j = 1, n)

end do

end

As a result of this program execution, matrix a is printed as follows:

Matrix A:

1.50000 2.50000 3.50000

1.50000 2.50000 3.50000

2786



The following example illustrates a call to the BLAS Level 3 routine ssymm. This routine performsa matrix-matrix operation

c := alpha*a*b' + beta*c.

Parameters

Specifies a scalar alpha.alpha

Specifies a scalar beta.beta

Symmetric matrix.a

m-by-n matrix.b

2787

Code Examples C

Parameters

m-by-n matrix.c

program symm_main

real a(3,3), b(3,2), c(3,3), alpha, beta

integer m, n, lda, ldb, ldc, i, j

character uplo, side

uplo = 'u'

side = 'l'

m = 3

n = 2

lda = 3

ldb = 3

ldc = 3

alpha = 0.5

beta = 2.0

do i = 1, m

do j = 1, m

a(i,j) = 1.0

end do

end do

do i = 1, m

do j = 1, n

c(i,j) = 1.0

b(i,j) = 2.0

end do

end do

call ssymm (side, uplo, m, n, alpha,

a, lda, b, ldb, beta, c, ldc)

2788


print*, `Matrix C: `

do i = 1, m

print*, (c(i,j), j = 1, n)

end do

end

As a result of this program execution, matrix c is printed as follows:

Matrix C:

5.00000 5.00000

5.00000 5.00000

5.00000 5.00000

Example C-5. Calling a Complex BLAS Level 1 Function from C

The following example illustrates a call from a C program to the complex BLAS Level 1 functionzdotc(). This function computes the dot product of two double-precision complex vectors.

In this example, the complex dot product is returned in the structure c.

#define N 5

void main()

{

2789

Code Examples C

int n, inca = 1, incb = 1, i;

typedef struct{ double re; double im; } complex16;

complex16 a[N], b[N], c;

void zdotc();

n = N;

for( i = 0; i < n; i++ ){

a[i].re = (double)i; a[i].im = (double)i * 2.0;

b[i].re = (double)(n - i); b[i].im = (double)i * 2.0;

}

zdotc( &c, &n, a, &inca, b, &incb );

printf( "The complex dot product is: ( %6.2f, %6.2f )\n", c.re, c.im );

}

NOTE. Instead of calling BLAS directly from C programs, you might wish to use theCBLAS interface; this is the supported way of calling BLAS from C. For more informationabout CBLAS, see Appendix D , which presents CBLAS, the C interface to the Basic LinearAlgebra Subprograms (BLAS) implemented in Intel® MKL.

PARDISO Code ExamplesThis section presents code examples of using the PARDISO direct solver for computing solutionsof linear systems with sparse matrices. For description of this solver, refer to Chapter 8 of themanual .

Examples for Sparse Symmetric Linear Systems

In this section two examples (Fortran, C) are provided to solve symmetric linear systems withPARDISO. To solve the systems of equations Ax = b, where

2790

Example Results for Symmetric Systems

Upon successful execution of the solver, the result of the solution X is as follows

Reordering completed ...

Number of nonzeros in factors = 30

Number of factorization MFLOPS = 0

Factorization completed ...

Solve completed ...

The solution of the system is

x(1) = -0.0418602013

x(2) = -0.00341312416

x(3) = 0.117250377

x(4) = -0.11263958

x(5) = 0.0241722445

x(6) = -0.10763334

x(7) = 0.198719673

x(8) = 0.190382964

2791

Code Examples C

Example C-6. pardiso_sym.f for Symmetric Linear SystemsC----------------------------------------------------------------------

C Example program to show the use of the "PARDISO" routine

C for symmetric linear systems

C---------------------------------------------------------------------

C This program can be downloaded from the following site:

C http://www.computational.unibas.ch/cs/scicomp

C

C (C) Olaf Schenk, Department of Computer Science,

C University of Basel, Switzerland.

C Email: [email protected]

C

C---------------------------------------------------------------------

PROGRAM pardiso_sym

IMPLICIT NONE

C.. Internal solver memory pointer for 64-bit architectures

C.. INTEGER*8 pt(64)



C.. This is OK in both cases

INTEGER*8 pt(64)

C.. All other variables

INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl

INTEGER iparm(64)

INTEGER ia(9)

INTEGER ja(18)

REAL*8 a(18)

REAL*8 b(8)

2792


REAL*8 x(8)

INTEGER i, idum

REAL*8 waltime1, waltime2, ddum

C.. Fill all arrays containing matrix data.

DATA n /8/, nrhs /1/, maxfct /1/, mnum /1/

DATA ia /1,5,8,10,12,15,17,18,19/

DATA ja

1 /1, 3, 6,7,

2 2,3, 5,

3 3, 8,

4 4, 7,

5 5,6,7,

6 6, 8,

7 7,

8 8/

DATA a

1 /7.d0, 1.d0, 2.d0,7.d0,

2 -4.d0,8.d0, 2.d0,

3 1.d0, 5.d0,

4 7.d0, 9.d0,

5 5.d0,1.d0,5.d0,

6 -1.d0, 5.d0,

7 11.d0,

8 5.d0/

integer omp_get_max_threads

external omp_get_max_threads

C..

C.. Set up PARDISO control parameter

2793

Code Examples C

C..

do i = 1, 64

iparm(i) = 0

end do

iparm(1) = 1 ! no solver default

iparm(2) = 2 ! fill-in reordering from METIS

iparm(3) = omp_get_max_threads() !numbers of processors, value of !OMP_NUM_THREADS

iparm(4) = 0 ! no iterative-direct algorithm

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! =0 solution on the first n compoments of x

iparm(7) = 16 ! default logical fortran unit number for output

iparm(8) = 9 ! numbers of iterative refinement steps

iparm(9) = 0 ! not in use

iparm(10) = 13 ! perturbe the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS



iparm(14) = 0 ! Output: number of perturbed pivots




iparm(18) = -1 ! Output: number of nonzeros in the factor LU

iparm(19) = -1 ! Output: Mflops for LU factorization

iparm(20) = 0 ! Output: Numbers of CG Iterations

error = 0 ! initialize error flag

msglvl = 0 ! don't print statistical information

mtype = -2 ! unsymmetric matrix symmetric, indefinite, no pivoting

C.. Initiliaze the internal solver memory pointer. This is only

2794


C necessary for the FIRST call of the PARDISO solver.

do i = 1, 64

pt(i) = 0

end do

C.. Reordering and Symbolic Factorization, This step also allocates

C all memory that is necessary for the factorization

phase = 11 ! only reordering and symbolic factorization

CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,

1 idum, nrhs, iparm, msglvl, ddum, ddum, error)

WRITE(*,*) 'Reordering completed ... '

IF (error .NE. 0) THEN

WRITE(*,*) 'The following ERROR was detected: ', error

STOP

END IF

WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18)

WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19)

C.. Factorization.

phase = 22 ! only factorization



WRITE(*,*) 'Factorization completed ... '



STOP

ENDIF

C.. Back substitution and iterative refinement

iparm(8) = 2 ! max numbers of iterative refinement steps


2795

Code Examples C

do i = 1, n

b(i) = 1.d0

end do


1 idum, nrhs, iparm, msglvl, b, x, error)

WRITE(*,*) 'Solve completed ... '

WRITE(*,*) 'The solution of the system is '

DO i = 1, n

WRITE(*,*) ' x(',i,') = ', x(i)

END DO

C.. Termination and release of memory

phase = -1 ! release internal memory

CALL pardiso (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum,


END

2796


Example C-7. pardiso_sym.c for Symmetric Linear Systems/* --------------------------------------------------------------------*/

/* Example program to show the use of the "PARDISO" routine */

/* on symmetric linear systems*/

/* --------------------------------------------------------------------*/

/* This program can be downloaded from the following site: */

/* http://www.computational.unibas.ch/cs/scicomp*/

/* */

/* (C) Olaf Schenk, Department of Computer Science, */

/* University of Basel, Switzerland.*/

/* Email: [email protected]*/

/* --------------------------------------------------------------------*/

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

extern int omp_get_max_threads();

/* PARDISO prototype. */

extern int PARDISO

(void *, int *, int *, int *, int *, int *,

double *, int *, int *, int*, int *, int *,

int *, double *, double *, int*);

int main( void ) {

/* Matrix data. */

int n = 8;

int ia[ 9] = { 1, 5, 8, 10, 12, 15, 17, 18, 19 };

int ja[18] = { 1, 3, 6, 7,

2, 3, 5,

2797

Code Examples C

3, 8,

4, 7,

5, 6, 7,

6, 8,

7,

8 };

double a[18] = { 7.0, 1.0, 2.0, 7.0,

-4.0, 8.0, 2.0,

1.0, 5.0,

7.0, 9.0,

5.0, 1.0, 5.0,

-1.0, 5.0,

11.0,

5.0 };

int mtype = -2; /* Real symmetric matrix */

/* RHS and solution vectors.*/

double b[8], x[8];

int nrhs = 1; /* Number of right hand sides. */

/* Internal solver memory pointer pt, */

/* 32-bit: int pt[64]; 64-bit: long int pt[64] */

/* or void *pt[64] should be OK on both architectures */

void *pt[64];

/* Pardiso control parameters.*/

int iparm[64];

int maxfct, mnum, phase, error, msglvl;

/* Auxiliary variables. */

int i;

double ddum; /* Double dummy*/

2798


int idum; /* Integer dummy.*/

/* --------------------------------------------------------------------*/

/* .. Setup Pardiso control parameters.*/

/* --------------------------------------------------------------------*/

for (i = 0; i < 64; i++) {

iparm[i] = 0;

}

iparm[0] = 1; /* No solver default*/

iparm[1] = 2; /* Fill-in reordering from METIS */

/* Numbers of processors, value of OMP_NUM_THREADS */

iparm[2] = omp_get_max_threads();

iparm[3] = 0; /* No iterative-direct algorithm */

iparm[4] = 0; /* No user fill-in reducing permutation */

iparm[5] = 0; /* Write solution into x */

iparm[6] = 16; /* Default logical fortran unit number for output */

iparm[7] = 2; /* Max numbers of iterative refinement steps */

iparm[8] = 0; /* Not in use*/

iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */

iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */



iparm[13] = 0; /* Output: Number of perturbed pivots */




iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */

iparm[18] = -1; /* Output: Mflops for LU factorization */

iparm[19] = 0; /* Output: Numbers of CG Iterations */

2799

Code Examples C

maxfct = 1; /* Maximum number of numerical factorizations. */

mnum = 1; /* Which factorization to use. */

msglvl = 0; /* Don't print statistical information in file */

error = 0; /* Initialize error flag */

/* --------------------------------------------------------------------*/

/* .. Initialize the internal solver memory pointer. This is only */

/* necessary for the FIRST call of the PARDISO solver. */

/* --------------------------------------------------------------------*/

for (i = 0; i < 64; i++) {

pt[i] = 0;

}

/* --------------------------------------------------------------------*/

/* .. Reordering and Symbolic Factorization. This step also allocates */

/* all memory that is necessary for the factorization. */

/* --------------------------------------------------------------------*/

phase = 11;

PARDISO (pt, &maxfct, &mnum, &mtype, &phase,

&n, a, ia, ja, &idum, &nrhs,

iparm, &msglvl, &ddum, &ddum, &error);

if (error != 0) {

printf("\nERROR during symbolic factorization: %d", error);

exit(1);

}

printf("\nReordering completed ... ");

printf("\nNumber of nonzeros in factors = %d", iparm[17]);

printf("\nNumber of factorization MFLOPS = %d", iparm[18]);

/* --------------------------------------------------------------------*/

/* .. Numerical factorization.*/

2800

/* --------------------------------------------------------------------*/

phase = 22;




if (error != 0) {

printf("\nERROR during numerical factorization: %d", error);

exit(2);

}

printf("\nFactorization completed ... ");

/* --------------------------------------------------------------------*/

/* .. Back substitution and iterative refinement. */

/* --------------------------------------------------------------------*/

phase = 33;

iparm[7] = 2; /* Max numbers of iterative refinement steps. */

/* Set right hand side to one.*/

for (i = 0; i < n; i++) {

b[i] = 1;

}



iparm, &msglvl, b, x, &error);

if (error != 0) {

printf("\nERROR during solution: %d", error);

exit(3);

}

printf("\nSolve completed ... ");

printf("\nThe solution of the system is: ");

2801

Code Examples C

for (i = 0; i < n; i++) {

printf("\n x [%d] = % f", i, x[i] );

}

printf ("\n");

/* --------------------------------------------------------------------*/

/* .. Termination and release of memory. */

/* --------------------------------------------------------------------*/

phase = -1; /* Release internal memory. */


&n, &ddum, ia, ja, &idum, &nrhs,


return 0;

}

Examples for Sparse Unsymmetric Linear Systems

In this section two examples (Fortran, C) are provided to solve unsymmetric linear systemswith PARDISO. To solve the systems of equations Ax = b, where

2802

Example Results for Unsymmetric Systems

Upon successful execution of the solver, the result of the solution X is as follows

Reordering completed ...

Number of nonzeros in factors = 21

Number of factorization MFLOPS = 0

Factorization completed ...

Solve completed ...

The solution of the system is

x( 1) = -0.522321429

x( 2) = -0.00892857143

x( 3) = 1.22098214

x( 4) = -0.504464286

x( 5) = -0.214285714

2803

Code Examples C

Example C-8. pardiso_unsym.f for Unsymmetric Linear Systems*******************************************************************************

* Copyright(C) 2004 Intel Corporation. All Rights Reserved.

* The source code contained or described herein and all documents related to

* the source code ("Material") are owned by Intel Corporation or itssuppliers

* or licensors. Title to the Material remains with Intel Corporation orits

* suppliers and licensors. The Material contains trade secrets andproprietary

* and confidential information of Intel or its suppliers and licensors.The

* Material is protected by worldwide copyright and trade secret lawsand

* treaty provisions. No part of the Material may be used, copied,reproduced,

* modified, published, uploaded, posted, transmitted, distributed ordisclosed

* in any way without Intel's prior express written permission.

* No license under any patent, copyright, trade secret or otherintellectual

* property right is granted to or conferred upon you by disclosure ordelivery

* of the Materials, either expressly, by implication, inducement, estoppelor

* otherwise. Any license under such intellectual property rights mustbe

* express and approved by Intel in writing.

*

********************************************************************************

* Content : MKL DSS Fortran-77 example

*

********************************************************************************

2804


C----------------------------------------------------------------------

C Example program to show the use of the "PARDISO" routine

C for symmetric linear systems

C---------------------------------------------------------------------

C This program can be downloaded from the following site:

C http://www.computational.unibas.ch/cs/scicomp

C

C (C) Olaf Schenk, Department of Computer Science,

C University of Basel, Switzerland.

C Email: [email protected]

C

C---------------------------------------------------------------------

PROGRAM pardiso_unsym

IMPLICIT NONE





C.. This is OK in both cases

INTEGER*8 pt(64)

C.. All other variables

INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl

INTEGER iparm(64)

INTEGER ia(6)

INTEGER ja(13)

REAL*8 a(13)

REAL*8 b(5)

REAL*8 x(5)

2805

Code Examples C

INTEGER i, idum

REAL*8 waltime1, waltime2, ddum


DATA n /5/, nrhs /1/, maxfct /1/, mnum /1/

DATA ia /1,4,6,9,12,14/

DATA ja

1 / 1, 2, 4,

2 1, 2,

3 3, 4, 5,

4 1, 3, 4,

5 2, 5/

DATA a

1 /1.d0,-1.d0, -3.d0,

2 -2.d0, 5.d0,

3 4.d0, 6.d0, 4.d0,

4 -4.d0, 2.d0, 7.d0,

5 8.d0, -5.d0/

integer omp_get_max_threads

external omp_get_max_threads

C..

C.. Set up PARDISO control parameter

C..

do i = 1, 64

iparm(i) = 0

end do

iparm(1) = 1 ! no solver default

iparm(2) = 2 ! fill-in reordering from METIS

iparm(3) = omp_get_max_threads() ! numbers of processors, value ofOMP_NUM_THREADS

2806


iparm(4) = 0 ! no iterative-direct algorithm

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! =0 solution on the first n compoments of x


iparm(8) = 9 ! numbers of iterative refinement steps


iparm(10) = 13 ! perturbe the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS



iparm(14) = 0 ! Output: number of perturbed pivots




iparm(18) = -1 ! Output: number of nonzeros in the factor LU

iparm(19) = -1 ! Output: Mflops for LU factorization

iparm(20) = 0 ! Output: Numbers of CG Iterations

error = 0 ! initialize error flag

msglvl = 1 ! print statistical information

mtype = 11 ! real unsymmetric

C.. Initiliaze the internal solver memory pointer. This is only

C necessary for the FIRST call of the PARDISO solver.

do i = 1, 64

pt(i) = 0

end do

C.. Reordering and Symbolic Factorization, This step also allocates

C all memory that is necessary for the factorization

phase = 11 ! only reordering and symbolic factorization

2807

Code Examples C



WRITE(*,*) 'Reordering completed ... '



STOP

END IF

WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18)

WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19)

C.. Factorization.




WRITE(*,*) 'Factorization completed ... '



STOP

ENDIF

C.. Back substitution and iterative refinement

iparm(8) = 2 ! max numbers of iterative refinement steps


do i = 1, n

b(i) = 1.d0

end do


1 idum, nrhs, iparm, msglvl, b, x, error)

WRITE(*,*) 'Solve completed ... '

WRITE(*,*) 'The solution of the system is '

2808


DO i = 1, n

WRITE(*,*) ' x(',i,') = ', x(i)

END DO

C.. Termination and release of memory

phase = -1 ! release internal memory

CALL pardiso (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum,


END

2809

Code Examples C

Example C-9. pardiso_unsym.c for Unsymmetric Linear Systems/*

********************************************************************************

* Copyright(C) 2004 Intel Corporation. All Rights Reserved.

* The source code contained or described herein and all documents relatedto














*

********************************************************************************

* Content : MKL DSS C example

*

2810


********************************************************************************

*/

/* --------------------------------------------------------------------*/

/* Example program to show the use of the "PARDISO" routine */

/* on symmetric linear systems*/

/* --------------------------------------------------------------------*/

/* This program can be downloaded from the following site: */

/* http://www.computational.unibas.ch/cs/scicomp*/

/* */

/* (C) Olaf Schenk, Department of Computer Science, */

/* University of Basel, Switzerland.*/

/* Email: [email protected]*/

/* --------------------------------------------------------------------*/

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

extern int omp_get_max_threads();

/* PARDISO prototype. */

#if defined(_WIN32) || defined(_WIN64)

#define pardiso_ PARDISO

#else

#define PARDISO pardiso_

#endif

extern int PARDISO

(void *, int *, int *, int *, int *, int *,

double *, int *, int*, int *, int *, int *,

int *, double *, double*, int *);

2811

Code Examples C

int main( void ) {

/* Matrix data. */

int n = 5;

int ia[ 6] = { 1, 4, 6, 9, 12, 14 };

int ja[13] = { 1, 2, 4,

1, 2,

3, 4, 5,

1, 3, 4,

2, 5 };

double a[18] = { 1.0, -1.0, -3.0,

-2.0, 5.0,

4.0, 6.0, 4.0,

-4.0, 2.0, 7.0,

8.0, -5.0 };

int mtype = 11; /* Real unsymmetric matrix */

/* RHS and solution vectors.*/

double b[5], x[5];

int nrhs = 1; /* Number of right hand sides. */

/* Internal solver memory pointer pt, */

/* 32-bit: int pt[64]; 64-bit: long int pt[64] */

/* or void *pt[64] should be OK on both architectures */

void *pt[64];

/* Pardiso control parameters.*/

int iparm[64];

int maxfct, mnum, phase, error, msglvl;

/* Auxiliary variables.*/

int i;

double ddum; /* Double dummy */

2812


int idum; /* Integer dummy. */

/* --------------------------------------------------------------------*/

/* .. Setup Pardiso control parameters.*/

/* --------------------------------------------------------------------*/

for (i = 0; i < 64; i++) {

iparm[i] = 0;

}

iparm[0] = 1; /* No solver default */

iparm[1] = 2; /* Fill-in reordering from METIS */

/* Numbers of processors, value of OMP_NUM_THREADS */

iparm[2] = omp_get_max_threads();

iparm[3] = 0; /* No iterative-direct algorithm */

iparm[4] = 0; /* No user fill-in reducing permutation */

iparm[5] = 0; /* Write solution into x */

iparm[6] = 0; /* Not in use */

iparm[7] = 2; /* Max numbers of iterative refinement steps */


iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */

iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */



iparm[13] = 0; /* Output: Number of perturbed pivots */




iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */

iparm[18] = -1; /* Output: Mflops for LU factorization */

iparm[19] = 0; /* Output: Numbers of CG Iterations */

2813

Code Examples C

maxfct = 1; /* Maximum number of numerical factorizations. */

mnum = 1; /* Which factorization to use. */

msglvl = 1; /* Print statistical information in file */

error = 0; /* Initialize error flag */

/* --------------------------------------------------------------------*/

/* .. Initialize the internal solver memory pointer. This is only */

/* necessary for the FIRST call of the PARDISO solver. */

/* --------------------------------------------------------------------*/

for (i = 0; i < 64; i++) {

pt[i] = 0;

}

/* --------------------------------------------------------------------*/

/* .. Reordering and Symbolic Factorization. This step also allocates */

/* all memory that is necessary for the factorization. */

/* --------------------------------------------------------------------*/

phase = 11;




if (error != 0) {

printf("\nERROR during symbolic factorization: %d", error);

exit(1);

}

printf("\nReordering completed ... ");

printf("\nNumber of nonzeros in factors = %d", iparm[17]);

printf("\nNumber of factorization MFLOPS = %d", iparm[18]);

/* --------------------------------------------------------------------*/

/* .. Numerical factorization.*/

2814

/* --------------------------------------------------------------------*/

phase = 22;




if (error != 0) {

printf("\nERROR during numerical factorization: %d", error);

exit(2);

}

printf("\nFactorization completed ... ");

/* --------------------------------------------------------------------*/

/* .. Back substitution and iterative refinement. */

/* --------------------------------------------------------------------*/

phase = 33;

iparm[7] = 2; /* Max numbers of iterative refinement steps. */

/* Set right hand side to one. */

for (i = 0; i < n; i++) {

b[i] = 1;

}



iparm, &msglvl, b, x, &error);

if (error != 0) {

printf("\nERROR during solution: %d", error);

exit(3);

}

printf("\nSolve completed ... ");

printf("\nThe solution of the system is: ");

2815

Code Examples C

for (i = 0; i < n; i++) {

printf("\n x [%d] = % f", i, x[i] );

}

printf ("\n");

/* --------------------------------------------------------------------*/

/* .. Termination and release of memory. */

/* --------------------------------------------------------------------*/

phase = -1; /* Release internal memory. */


&n, &ddum, ia, ja, &idum, &nrhs,


return 0;

}

Direct Sparse Solver Code ExamplesThis section contains example code in Fortran 77, Fortran 90 and C. For description of thesparse solver routines used in this code, refer to “Direct Sparse Solver (DSS) Interface Routines”in Chapter 8 of the manual . The example code solves the equations presented in Direct Methodsection of Appendix A - a symmetric positive definite system of equations Ax= b with a sparsematrix, where

2816

Example results for symmetric systems

Upon successful execution of the solver, the determinant and the result of the solution arrayare as follows

pow of determinant is 0.000

base of determinant is 2.250

Determinant is 2.250

Solution Array: -326.333 983.000 163.417 398.000 61.500

2817

Code Examples C

Example C-10. Fortran 77 example to Solve Symmetric Positive DefiniteSystem********************************************************************************

* Copyright(C) 2001-2004 Intel Corporation. All Rights Reserved.















*

********************************************************************************

* Content : Intel MKL DSS Fortran-77 example

*

2818


********************************************************************************

C---------------------------------------------------------------------------

C Example program for solving symmetric positive definite system of

C equations.

C---------------------------------------------------------------------------

PROGRAM solver_f77_test

IMPLICIT NONE

INCLUDE 'mkl_dss.f77'

C---------------------------------------------------------------------------

C Define the array and rhs vectors

C---------------------------------------------------------------------------

INTEGER nRows, nCols, nNonZeros, i, nRhs

PARAMETER (nRows = 5,

1 nCols = 5,

2 nNonZeros = 9,

3 nRhs = 1)

INTEGER rowIndex(nRows + 1), columns(nNonZeros)

DOUBLE PRECISION values(nNonZeros), rhs(nRows)

DATA rowIndex / 1, 6, 7, 8, 9, 10 /

DATA columns / 1, 2, 3, 4, 5, 2, 3, 4, 5 /

DATA values / 9, 1.5, 6, .75, 3, 0.5, 12, .625, 16 /

DATA rhs / 1, 2, 3, 4, 5 /

C---------------------------------------------------------------------------

C Allocate storage for the solver handle and the solution vector

C---------------------------------------------------------------------------

DOUBLE PRECISION solution(nRows)

INTEGER*8 handle

INTEGER error

2819

Code Examples C

CHARACTER*15 statIn

DOUBLE PRECISION statOut(5)

INTEGER bufLen

PARAMETER(bufLen = 20)

INTEGER buff(bufLen)

C---------------------------------------------------------------------------

C Initialize the solver

C---------------------------------------------------------------------------

error = dss_create(handle, MKL_DSS_DEFAULTS)

IF (error .NE. MKL_DSS_SUCCESS ) GOTO 999

C---------------------------------------------------------------------------

C Define the non-zero structure of the matrix

C---------------------------------------------------------------------------

error = dss_define_structure( handle, MKL_DSS_SYMMETRIC,

& rowIndex, nRows, nCols, columns, nNonZeros )


C---------------------------------------------------------------------------

C Reorder the matrix

C---------------------------------------------------------------------------

error = dss_reorder( handle, MKL_DSS_DEFAULTS, 0)


C---------------------------------------------------------------------------

C Factor the matrix

C---------------------------------------------------------------------------

error = dss_factor_real( handle,

& MKL_DSS_DEFAULTS, VALUES)


C---------------------------------------------------------------------------

2820


C Get the solution vector

C---------------------------------------------------------------------------

error = dss_solve_real( handle, MKL_DSS_DEFAULTS,

& rhs, nRhs, solution)


C---------------------------------------------------------------------------

C Print Determinant of the matrix

C---------------------------------------------------------------------------

statIn = 'determinant'

call mkl_cvt_to_null_terminated_str(buff,bufLen,statIn)

error = dss_statistics(handle, MKL_DSS_DEFAULTS,

& buff,statOut)

WRITE(*,"(' pow of determinant is ', 5(F10.3))") statOut(1)

WRITE(*,"(' base of determinant is ', 5(F10.3))") statOut(2)

WRITE(*,"(' Determinant is ', 5(F10.3))")(10**statOut(1))*

& statOut(2)

C---------------------------------------------------------------------------

C Deallocate solver storage

C---------------------------------------------------------------------------

error = dss_delete( handle, MKL_DSS_DEFAULTS )


C---------------------------------------------------------------------------

C Print solution vector

C---------------------------------------------------------------------------

WRITE(*,900) (solution(i), i = 1, nCols)

900 FORMAT(' Solution Array: ',5(F10.3))

GOTO 1000

999 WRITE(*,*) "Solver returned error code ", error

2821

Code Examples C

1000 END

2822


Example C-11. C Example to Solve Symmetric Positive Definite System/*

*******************************************************************************
















*

********************************************************************************

* Content : Intel MKL DSS C example

*

2823

Code Examples C

********************************************************************************/

/*

** Example program to solve symmetric positive definite system of equations.

*/

#include<stdio.h>

#include<stdlib.h>

#include<math.h>

#include "mkl_dss.h"

/*

** Define the array and rhs vectors

*/

#define NROWS 5

#define NCOLS 5

#define NNONZEROS 9

#define NRHS 1

static const int nRows = NROWS ;

static const int nCols = NCOLS ;

static const int nNonZeros = NNONZEROS ;

static const int nRhs = NRHS ;

static _INTEGER_t rowIndex[NROWS+1] = { 1, 6, 7, 8, 9, 10 };

static _INTEGER_t columns[NNONZEROS] = { 1, 2, 3, 4, 5, 2, 3, 4, 5 };

static _DOUBLE_PRECISION_t values[NNONZEROS] = { 9, 1.5, 6, .75, 3, 0.5, 12,.625, 16 };

static _DOUBLE_PRECISION_t rhs[NCOLS] = { 1, 2, 3, 4, 5 };

void main() {

int i;

/* Allocate storage for the solver handle and the right-hand side. */

2824

_DOUBLE_PRECISION_t solValues[NROWS];

_MKL_DSS_HANDLE_t handle;

_INTEGER_t error;

_CHARACTER_STR_t statIn[] = "determinant";

_DOUBLE_PRECISION_t statOut[5];

int opt = MKL_DSS_DEFAULTS;

int sym = MKL_DSS_SYMMETRIC;

int type = MKL_DSS_POSITIVE_DEFINITE;

/* ---------------------*/

/* Initialize the solver */

/* ---------------------*/

error = dss_create(handle, opt );

if ( error != MKL_DSS_SUCCESS ) goto printError;

/* -------------------------------------------*/

/* Define the non-zero structure of the matrix */

/* -------------------------------------------*/

error = dss_define_structure(

handle, sym, rowIndex, nRows, nCols,

columns, nNonZeros );


/* ------------------*/

/* Reorder the matrix */

/* ------------------*/

error = dss_reorder(handle, opt, 0);


/* ------------------*/

/* Factor the matrix */

/* ------------------*/

2825

Code Examples C

error = dss_factor_real(handle, type, values );


/* ------------------------*/

/* Get the solution vector */

/* ------------------------*/

error = dss_solve_real(handle, opt, rhs, nRhs, solValues );


/* ------------------------*/

/* Get the determinant*/

/*--------------------------*/

error = dss_statistics(handle, opt, statIn, statOut);


/*-------------------------*/

/* print determinant*/

/*-------------------------*/

printf(" determinant power is %g \n", statOut[0]);

printf(" determinant base is %g \n", statOut[1]);

printf(" Determinant is %g \n", (pow(10.0,statOut[0]))*statOut[1]);

free((void*) statIn);

/* --------------------------*/

/* Deallocate solver storage */

/* --------------------------*/

error = dss_delete(handle, opt );


/* ----------------------*/

/* Print solution vector */

/* ----------------------*/

printf(" Solution array: ");

2826


for(i = 0; i< nCols; i++)

printf(" %g", solValues[i] );

printf("\n");

exit(0);

printError:

printf("Solver returned error code %d\n", error);

exit(1);

}

2827

Code Examples C

Example C-12. Fortran 90 Example to Solve Symmetric Positive DefiniteSystem!*******************************************************************************

! Copyright(C) 2001-2004 Intel Corporation. All Rights Reserved.

! The source code contained or described herein and all documents relatedto

! the source code ("Material") are owned by Intel Corporation or itssuppliers

! or licensors. Title to the Material remains with Intel Corporation orits

! suppliers and licensors. The Material contains trade secrets andproprietary

! and confidential information of Intel or its suppliers and licensors.The

! Material is protected by worldwide copyright and trade secret lawsand

! treaty provisions. No part of the Material may be used, copied,reproduced,

! modified, published, uploaded, posted, transmitted, distributed ordisclosed

! in any way without Intel's prior express written permission.

! No license under any patent, copyright, trade secret or otherintellectual

! property right is granted to or conferred upon you by disclosure ordelivery

! of the Materials, either expressly, by implication, inducement, estoppelor

! otherwise. Any license under such intellectual property rights mustbe

! express and approved by Intel in writing.

!

!*******************************************************************************

! Content : Intel MKL DSS Fortran-90 example

!

2828


!*******************************************************************************

!--------------------------------------------------------------------------

!

! Example program for solving a symmetric positive definite system of

! equations.

!

!--------------------------------------------------------------------------

INCLUDE 'mkl_dss.f90' ! Include the standard DSS "header file."

PROGRAM solver_f90_test

use mkl_dss

IMPLICIT NONE

INTEGER, PARAMETER :: dp = KIND(1.0D0)

INTEGER :: error

INTEGER :: i

INTEGER, PARAMETER :: bufLen = 20

! Define the data arrays and the solution and rhs vectors.

INTEGER, ALLOCATABLE :: columns( : )

INTEGER :: nCols

INTEGER :: nNonZeros

INTEGER :: nRhs

INTEGER :: nRows

REAL(KIND=DP), ALLOCATABLE :: rhs( : )

INTEGER, ALLOCATABLE :: rowIndex( : )

REAL(KIND=DP), ALLOCATABLE :: solution( : )

REAL(KIND=DP), ALLOCATABLE :: values( : )

TYPE(MKL_DSS_HANDLE) :: handle ! Allocate storage for the solver handle.

REAL(KIND=DP),ALLOCATABLE::statOUt( : )

CHARACTER*15 statIn

2829

Code Examples C

INTEGER perm(1)

INTEGER buff(bufLen)

EXTERNAL MKL_CVT_TO_NULL_TERMINATED_STR

! Set the problem to be solved.

nRows = 5

nCols = 5

nNonZeros = 9

nRhs = 1

perm(1) = 0

ALLOCATE( rowIndex(nRows + 1 ) )

rowIndex = (/ 1, 6, 7, 8, 9, 10 /)

ALLOCATE( columns(nNonZeros ) )

columns = (/ 1, 2, 3, 4, 5, 2, 3, 4, 5 /)

ALLOCATE( values( nNonZeros ) )

values = (/ 9.0_DP, 1.5_DP, 6.0_DP, 0.75_DP, 3.0_DP, 0.5_DP, 12.0_DP, &

& 0.625_DP, 16.0_DP /)

ALLOCATE( rhs( nRows ) )

rhs = (/ 1.0_DP, 2.0_DP, 3.0_DP, 4.0_DP, 5.0_DP /)

! Initialize the solver.

error = dss_create( handle, MKL_DSS_DEFAULTS )

IF (error /= MKL_DSS_SUCCESS) GOTO 999

! Define the non-zero structure of the matrix.

error = dss_define_structure( handle, MKL_DSS_SYMMETRIC, rowIndex, nRows, &

& nCols, columns, nNonZeros )


! Reorder the matrix.

error = dss_reorder( handle, MKL_DSS_DEFAULTS, perm )


2830


! Factor the matrix.

error = dss_factor_real( handle, MKL_DSS_DEFAULTS, values )


! Allocate the solution vector and solve the problem.

ALLOCATE( solution( nRows ) )

error = dss_solve_real(handle, MKL_DSS_DEFAULTS, rhs, nRhs, solution )


! Print Out the determinant of the matrix

ALLOCATE(statOut( 5 ) )

statIn = 'determinant'

call mkl_cvt_to_null_terminated_str(buff,bufLen,statIn);

error = dss_statistics(handle, MKL_DSS_DEFAULTS, buff, statOut )


WRITE(*,"('pow of determinant is '(5F10.3))") ( statOut(1) )

WRITE(*,"('base of determinant is '(5F10.3))") ( statOut(2) )

WRITE(*,"('Determinant is '(5F10.3))") ( (10**statOut(1))*statOut(2) )

! Deallocate solver storage and various local arrays.

error = dss_delete( handle, MKL_DSS_DEFAULTS )

IF (error /= MKL_DSS_SUCCESS ) GOTO 999

IF ( ALLOCATED( rowIndex) ) DEALLOCATE( rowIndex )

IF ( ALLOCATED( columns ) ) DEALLOCATE( columns )

IF ( ALLOCATED( values ) ) DEALLOCATE( values )

IF ( ALLOCATED( rhs ) ) DEALLOCATE( rhs )

IF ( ALLOCATED( statOut ) ) DEALLOCATE( statOut )

! Print the solution vector, deallocate it and exit

WRITE(*,"('Solution Array: '(5F10.3))") ( solution(i), i = 1, nCols )

IF ( ALLOCATED( solution ) ) DEALLOCATE( solution )

GOTO 1000

2831

Code Examples C

! Print an error message and exit

999 WRITE(*,*) "Solver returned error code ", error

1000 CONTINUE

END PROGRAM solver_f90_test

Iterative Sparse Solver Code ExamplesThis section contains example code in Fortran 77 and C. For description of the iterative sparsesolver routines based on the reverse communication interface (RCI ISS) used in this code, referto “Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS)” in Chapter8 of the manual .

Example of Use RCI (Preconditioned) Conjugate Gradient Solver

Example results for symmetric positive definite systems. Upon successful execution of thesolver, the result of the solution array is as follows:

The system is successfully solved

The following solution obtained

1.000 0.000 1.000 0.000

1.000 0.000 1.000 0.000

Expected solution

1.000 0.000 1.000 0.000

1.000 0.000 1.000 0.000

Number of iterations: 8

2832


Example C-13a. Fortran 77 Example to Solve Symmetric PositiveDefinite System****************************************************************************

*



* the source code ("Material") are owned by Intel Corporation or its suppliers

* or licensors. Title to the Material remains with Intel Corporation or its

* suppliers and licensors. The Material contains trade secrets and proprietary

* and confidential information of Intel or its suppliers and licensors. The

* Material is protected by worldwide copyright and trade secret laws and

* treaty provisions. No part of the Material may be used, copied, reproduced,

* modified, published, uploaded, posted, transmitted, distributed or disclosed


* No license under any patent, copyright, trade secret or other intellectual

* property right is granted to or conferred upon you by disclosure or delivery


* otherwise. Any license under such intellectual property rights must be


*

****************************************************************************

*

* Content : Intel MKL RCI (P)CG Fortran-77 example

*

****************************************************************************

*

C---------------------------------------------------------------------------

2833

Code Examples C

C Example program for solving symmetric positive definite system of

C equations.

C---------------------------------------------------------------------------

PROGRAM rci_pcg_f77_test

IMPLICIT NONE

C---------------------------------------------------------------------------

C Define arrays for the upper triangle of the coefficient matrix and rhsvector

C Compressed sparse row storage is used for sparse representation

C---------------------------------------------------------------------------

INTEGER N, RCI_request, itercount, i

PARAMETER (N=8)

DOUBLE PRECISION rhs(N), solution(N)

INTEGER ia(9)

INTEGER ja(18)

DOUBLE PRECISION a(18)


DATA ia /1,5,8,10,12,15,17,18,19/

DATA ja

1 /1, 3, 6,7,

2834


2 2, 3, 5,

3 3, 8,

4 4, 7,

5 5,6,7,

6 6, 8,

7 7,

8 8/

DATA a

1 /7.D0, 1.D0, 2.D0, 7.D0,

2 -4.D0,8.D0, 2.D0,

3 1.D0, 5.D0,

4 7.D0, 9.D0,

5 5.D0, 1.D0, 5.D0,

6 -1.D0, 5.D0,

7 11.D0,

8 5.D0/

C---------------------------------------------------------------------------

C Allocate storage for the solver ?par and the initial solution vector

C---------------------------------------------------------------------------

INTEGER length

PARAMETER (length=128)

DOUBLE PRECISION expected(N)

DATA expected/1.D0, 0.D0, 1.D0, 0.D0, 1.D0, 0.D0, 1.D0, 0.D0/

INTEGER ipar(length)

DOUBLE PRECISION dpar(length),tmp(N,4)

C---------------------------------------------------------------------------

C Initialize the right hand side through matrix-vector product

C---------------------------------------------------------------------------

2835

Code Examples C

CALL DCSRMV_SY('U', N, A, IA, JA, expected, rhs)

C---------------------------------------------------------------------------

C Initialize the initial guess

C---------------------------------------------------------------------------

DO I=1, N

solution(I)=1.D0

ENDDO

2836


C---------------------------------------------------------------------------


C---------------------------------------------------------------------------

CALL dcg_init(N,solution,rhs,RCI_request,ipar,dpar,tmp)

IF (RCI_request .NE. 0 ) GOTO 999

C---------------------------------------------------------------------------

C Set the desired parameters:

C LOGICAL parameters:

C do residual stopping test

C do not request for the user defined stopping test

C do Preconditioned Conjugate Gradient iterations

C DOUBLE PRECISION parameters

C set the relative tolerance to 1.0D-5 instead of default value 1.0D-6

C---------------------------------------------------------------------------

ipar(9)=1

ipar(10)=0

ipar(11)=1

dpar(1)=1.D-5

C---------------------------------------------------------------------------

C Check the correctness and consistency of the newly set parameters

C---------------------------------------------------------------------------

CALL dcg_check(N,solution,rhs,RCI_request,ipar,dpar,tmp)

IF (RCI_request .NE. 0 ) GOTO 999

C---------------------------------------------------------------------------

C Compute the solution by RCI PCG solver

C Reverse Communications starts here

C---------------------------------------------------------------------------

1 CALL dcg(N,solution,rhs,RCI_request,ipar,dpar,tmp)

2837

Code Examples C

C---------------------------------------------------------------------------

C If RCI_request=0, then the solution was found with the required precision

C---------------------------------------------------------------------------

IF (RCI_request .EQ. 0) THEN

GOTO 700

C---------------------------------------------------------------------------

C If RCI_request=1, then compute the vector A*tmp(:,1)

C and put the result in vector tmp(:,2)

C---------------------------------------------------------------------------

ELSE IF (RCI_request .EQ. 1) THEN

CALL DCSRMV_SY('U', N, A, IA, JA, TMP, TMP(1, 2))

GOTO 1

C---------------------------------------------------------------------------

C If RCI_request=3, then compute vector preconditioner matrix on tmp(:,3)

C and put the result in vector tmp(:,4)

C---------------------------------------------------------------------------

ELSE IF (RCI_request .EQ. 3) THEN

CALL DCOPY(N, TMP(1,3),1, TMP(1, 4), 1)

GOTO 1

ELSE

C---------------------------------------------------------------------------

C If RCI_request=anything else, then dcg subroutine failed

2838


C to compute the solution vector: solution(N)

C---------------------------------------------------------------------------

GOTO 999

ENDIF

C---------------------------------------------------------------------------

C Reverse Communication ends here

C Get the current iteration number

C---------------------------------------------------------------------------

700 CALL dcg_get(N,solution,rhs,RCI_request,ipar,dpar,tmp,

& itercount)

C---------------------------------------------------------------------------

C Print solution vector: solution(N) and number of iterations: itercount

C---------------------------------------------------------------------------

WRITE(*, *) ' The system is successfully solved '

WRITE(*, *) ' The following solution obtained '

WRITE(*,800)(solution(i),i =1,N)

WRITE(*, *) ' Expected solution '

WRITE(*,800)(expected(i),i =1,N)

800 FORMAT(4(F10.3))

WRITE(*,900)(itercount)

900 FORMAT(' Number of iterations: ',1(I2))

GOTO 1000

999 WRITE(*,*) 'Solver returned error code ', RCI_request

STOP

1000 CONTINUE

read *

END

2839

Code Examples C

Fortran Example of Using RCI (Preconditioned) Flexible Generalized Minimal ResidualSolver.

Fortran example results for a non-symmetric indefinite system. Upon successful execution ofthe solver, the following result is printed (up to rounding errors that depend on the computersystem used):

--------------------------------------------------

The SIMPLEST example of usage of RCI FGMRES solver

2840


to solve a non-symmetric indefinite non-degenerate

algebraic system of linear equations

--------------------------------------------------

Some info about the current run of RCI FGMRES method:

As IPAR(8)=1, the automatic test for the maximal number of iterationswill be performed

As IPAR(9)=1, the automatic residual test will be performed

+++

As IPAR(10)=0, the user-defined stopping test will not be requested, thus,RCI_REQUEST will not take the value 2

+++

As IPAR(11)=0, the Preconditioned FGMRES iterations will not be performed,thus, RCI_REQUEST will not take the value 3

+++

As IPAR(12)=1, the automatic test for the norm of the next generated vectoris not equal to zero up to rounding and computational errors will beperformed, thus, RCI_REQUEST will not take the value 4

+++

The system has been SUCCESSFULLY solved

The following solution has been obtained:

COMPUTED_SOLUTION(1)=-0.100E+01

COMPUTED_SOLUTION(2)= 0.100E+01

COMPUTED_SOLUTION(3)= 0.305E-15

COMPUTED_SOLUTION(4)= 0.100E+01

COMPUTED_SOLUTION(5)=-0.100E+01

2841

Code Examples C

The expected solution is:

EXPECTED_SOLUTION(1)=-0.100E+01

EXPECTED_SOLUTION(2)= 0.100E+01



EXPECTED_SOLUTION(5)=-0.100E+01

+++


+++

2842


Example C-13b. Fortran Example to Solve Non-Symmetric IndefiniteSystemC***************************************************************************

C

C Copyright(C) 2005-2006 Intel Corporation. All Rights Reserved.

C The source code contained or described herein and all documents relatedto

C the source code ("Material") are owned by Intel Corporation or its suppliers

C or licensors. Title to the Material remains with Intel Corporation or its

C suppliers and licensors. The Material contains trade secrets and proprietary

C and confidential information of Intel or its suppliers and licensors. The

C Material is protected by worldwide copyright and trade secret laws and

C treaty provisions. No part of the Material may be used, copied, reproduced,

C modified, published, uploaded, posted, transmitted, distributed or disclosed

C in any way without Intel's prior express written permission.

C No license under any patent, copyright, trade secret or other intellectual

C property right is granted to or conferred upon you by disclosure or delivery

C of the Materials, either expressly, by implication, inducement, estoppelor

C otherwise. Any license under such intellectual property rights must be

C express and approved by Intel in writing.

C

C***************************************************************************

C Content:

C Intel MKL RCI (P)FGMRES ((Preconditioned) Flexible Generalized Minimal

CRESidual method) example

C****************************************************************************

C

C---------------------------------------------------------------------------

2843

Code Examples C

C Example program for solving non-symmetric indefinite system of equations

C Simplest case: no preconditioning and no user-defined stopping tests

C---------------------------------------------------------------------------

PROGRAM DFGMRES_NO_PRECON_F

INCLUDE "mkl_rci.fi"

INTEGER N

PARAMETER(N=5)

INTEGER SIZE

PARAMETER (SIZE=128)

C---------------------------------------------------------------------------

C Define arrays for the upper triangle of the coefficient matrix

C Compressed sparse row storage is used for sparse representation

----------------------------------------------------------------------------

INTEGER IA(6)

DATA IA /1,3,6,9,12,14/

INTEGER JA(13)

DATA JA / 1, 3,

1 1, 2, 4,

2 2, 3, 5,

3 3, 4, 5,

4 4, 5 /

DOUBLE PRECISION A(13)

DATA A / 1.0, -1.0,

1 -1.0, 1.0, -1.0,

2 1.0,-2.0, 1.0,

3 -1.0, 2.0,-1.0,

4 -1.0,-3.0 /

C---------------------------------------------------------------------------

2844


C Allocate storage for the ?par parameters and the solution/rhs vectors

C---------------------------------------------------------------------------

2845

Code Examples C

INTEGER IPAR(SIZE)

DOUBLE PRECISION DPAR(SIZE), TMP(N*(2*N+1)+(N*(N+9))/2+1)

DOUBLE PRECISION EXPECTED_SOLUTION(N)

DATA EXPECTED_SOLUTION /-1.0,1.0,0.0,1.0,-1.0/

DOUBLE PRECISION RHS(N)

DOUBLE PRECISION COMPUTED_SOLUTION(N)

C---------------------------------------------------------------------------

C Some additional variables to use with the RCI (P)FGMRES solver

C---------------------------------------------------------------------------

INTEGER ITERCOUNT

INTEGER RCI_REQUEST, I

PRINT *,'--------------------------------------------------'

PRINT *,'The SIMPLEST example of usage of RCI FGMRES solver'

PRINT *,'to solve a non-symmetric indefinite non-degenerate'

PRINT *,' algebraic system of linear equations'

PRINT *,'--------------------------------------------------'

C---------------------------------------------------------------------------

C Initialize variables and the right hand side through matrix-vector product

C---------------------------------------------------------------------------

CALL MKL_DCSRGEMV('N', N, A, IA, JA, EXPECTED_SOLUTION, RHS)

C---------------------------------------------------------------------------

C Initialize the initial guess

C---------------------------------------------------------------------------

DO I=1,N

COMPUTED_SOLUTION(I)=1.0

ENDDO

C---------------------------------------------------------------------------


2846


C---------------------------------------------------------------------------

CALL DFGMRES_INIT(N, COMPUTED_SOLUTION, RHS, RCI_REQUEST, IPAR,

1 DPAR, TMP)

IF (RCI_REQUEST.NE.0) GOTO 999

C---------------------------------------------------------------------------

C Set the desired parameters:

C LOGICAL parameters:

C do residual stopping test

C do not request for the user defined stopping test

C do the check of the norm of the next generated vector automatically

C DOUBLE PRECISION parameters

C set the relative tolerance to 1.0D-3 instead of default value 1.0D-6

C---------------------------------------------------------------------------

IPAR(9)=1

IPAR(10)=0

IPAR(12)=1

DPAR(1)=1.0D-3

C---------------------------------------------------------------------------

C Check the correctness and consistency of the newly set parameters

C---------------------------------------------------------------------------

CALL DFGMRES_CHECK(N, COMPUTED_SOLUTION, RHS, RCI_REQUEST,

1 IPAR, DPAR, TMP)

IF (RCI_REQUEST.NE.0) GOTO 999

C---------------------------------------------------------------------------

C Print the info about the RCI FGMRES method

2847

Code Examples C

C---------------------------------------------------------------------------

PRINT *, ''

PRINT *,'Some info about the current run of RCI FGMRES method:'

PRINT *, ''

IF (IPAR(8).NE.0) THEN

WRITE(*,'(A,I1,A)') 'As IPAR(8)=',IPAR(8),', the automatic

1 test for the maximal number of iterations will be'

PRINT *,'performed'

ELSE

WRITE(*,'(A,I1,A)') 'As IPAR(8)=',IPAR(8),', the automatic test

1 for the maximal number of iterations will be'

PRINT *,'skipped'

ENDIF

PRINT *,'+++'



1 residual test will be performed'

ELSE


1 residual test will be skipped'

ENDIF

PRINT *,'+++'


WRITE(*,'(A,I1,A)') 'As IPAR(10)=',IPAR(10),', the user-defined

1 stopping test will be requested via'

PRINT *,'RCI_REQUEST=2'

ELSE

2848


WRITE(*,'(A,I1,A)') 'As IPAR(10)=',IPAR(10),', the user-defined

1 stopping test will not be requested, thus,'

PRINT *,'RCI_REQUEST will not take the value 2'

ENDIF

PRINT *,'+++'


WRITE(*,'(A,I1,A)') 'As IPAR(11)=',IPAR(11),', the

1 Preconditioned FGMRES iterations will be performed, thus,'

PRINT *,'the preconditioner action will be requested via

1 RCI_REQUEST=3'

ELSE

WRITE(*,'(A,I1,A)') 'As IPAR(11)=',IPAR(11),', the

1 Preconditioned FGMRES iterations will not be performed,'

PRINT *,'thus, RCI_REQUEST will not take the value 3'

ENDIF

PRINT *,'+++'


WRITE(*,'(A,I1,A)')'As IPAR(12)=',IPAR(12),', the automatic

1 test for the norm of the next generated vector is'

PRINT *,'not equal to zero up to rounding and computational

1 errors will be performed,'

PRINT *,'thus, RCI_REQUEST will not take the value 4'

ELSE

2849

Code Examples C

WRITE(*,'(A,I1,A)')'As IPAR(12)=',IPAR(12),', the automatic

1 test for the norm of the next generated vector is'

PRINT *,'not equal to zero up to rounding and computational

1 errors will be skipped,'

PRINT *,'thus, the user-defined test will be requested via

1 RCI_REQUEST=4'

ENDIF

PRINT *,'+++'

C---------------------------------------------------------------------------

C Compute the solution by RCI (P)FGMRES solver without preconditioning

C Reverse Communication starts here

C---------------------------------------------------------------------------

1 CALL DFGMRES(N, COMPUTED_SOLUTION, RHS, RCI_REQUEST, IPAR,

1 DPAR, TMP)

C---------------------------------------------------------------------------

C If RCI_REQUEST=0, then the solution was found with the required precision

C---------------------------------------------------------------------------

IF (RCI_REQUEST.EQ.0) GOTO 3

C---------------------------------------------------------------------------

C If RCI_REQUEST=1, then compute the vector A*TMP(IPAR(22))

C and put the result in vector TMP(IPAR(23))

C---------------------------------------------------------------------------

IF (RCI_REQUEST.EQ.1) THEN

CALL MKL_DCSRGEMV('N',N, A, IA, JA, TMP(IPAR(22)), TMP(IPAR(23)))

GOTO 1

C---------------------------------------------------------------------------

C If RCI_REQUEST=anything else, then DFGMRES subroutine failed

2850


C to compute the solution vector: COMPUTED_SOLUTION(N)

C---------------------------------------------------------------------------

ELSE

GOTO 999

ENDIF

C---------------------------------------------------------------------------

C Reverse Communication ends here

C Get the current iteration number and the FGMRES solution (DO NOT FORGETto

C call DFGMRES_GET routine as COMPUTED_SOLUTION is still containing

C the initial guess!)

C---------------------------------------------------------------------------

3 CALL DFGMRES_GET(N, COMPUTED_SOLUTION, RHS, RCI_REQUEST, IPAR,

1 DPAR, TMP, ITERCOUNT)

C---------------------------------------------------------------------------

C Print solution vector: COMPUTED_SOLUTION(N) and

2851

Code Examples C

C the number of iterations: ITERCOUNT

C---------------------------------------------------------------------------

PRINT *, ''

PRINT *,' The system has been SUCCESSFULLY solved'

PRINT *, ''

PRINT *,' The following solution has been obtained:'

DO I=1,N

WRITE(*,'(A18,I1,A2,E10.3)') 'COMPUTED_SOLUTION(',I,')=',

1 COMPUTED_SOLUTION(I)

ENDDO

PRINT *, ''

PRINT *,' The expected solution is:'

DO I=1,N

WRITE(*,'(A18,I1,A2,E10.3)') 'EXPECTED_SOLUTION(',I,')=',

1 EXPECTED_SOLUTION(I)

ENDDO

PRINT *, ''

PRINT *,' Number of iterations: ',ITERCOUNT

GOTO 1000

999 PRINT *,'The solver has returned the ERROR code ', RCI_REQUEST

1000 CONTINUE

END

C Example of Using RCI (Preconditioned) Flexible Generalized Minimal Residual Solver.

C example results for the same non-symmetric indefinite system as in the previous example.The results are the same up to the notational convention between C and Fortran. Please payspecial attention to how it is recommended to handle the differences between the Fortran andC arrays. Specifically, in this example we adjust the addresses for input/result for user-defined

2852


operations from IPAR(22) to ipar[21]-1, and from IPAR(23) to ipar[22]-1, respectively.Upon successful execution of the solver, the following result is printed (up to rounding errorsthat depend on the computer system used):

--------------------------------------------------

The SIMPLEST example of usage of RCI FGMRES solver

to solve a non-symmetric indefinite non-degenerate

algebraic system of linear equations

--------------------------------------------------

Some info about the current run of RCI FGMRES method:

As ipar[7]=1, the automatic test for the maximal number of iterations willbe performed

+++

As ipar[8]=1, the automatic residual test will be performed

+++

As ipar[9]=0, the user-defined stopping test will not be requested, thus,

RCI_request will not take the value 2

+++

As ipar[10]=0, the Preconditioned FGMRES iterations will not be performed,thus, RCI_request will not take the value 3

+++

As ipar[11]=1, the automatic test for the norm of the next generated vectoris not equal to zero up to rounding and computational errors will beperformed, thus, RCI_request will not take the value 4

+++

The system has been SUCCESSFULLY solved

The following solution has been obtained:

2853

Code Examples C

computed_solution[0]=-1.000000e+000

computed_solution[1]=1.000000e+000

computed_solution[2]=3.053113e-016

computed_solution[3]=1.000000e+000

computed_solution[4]=-1.000000e+000

The expected solution is:

expected_solution[0]=-1.000000e+000

expected_solution[1]=1.000000e+000



expected_solution[4]=-1.000000e+000


2854


Example C-13c. C Example to Solve Non-Symmetric Indefinite System/******************************************************************************

/* INTEL CONFIDENTIAL

/* Copyright(C) 2005-2006 Intel Corporation. All Rights Reserved.

/* The source code contained or described herein and all documents relatedto

/* the source code ("Material")are owned by Intel Corporation or itssuppliers

/* or licensors. Title to the Material remains with Intel Corporation orits

/* suppliers and licensors.The Material contains trade secrets andproprietary

/* and confidential information of Intel or its suppliers and licensors.The

/* Material is protected by worldwide copyright and trade secret lawsand

/* treaty provisions. No part of the Material may be used, copied,reproduced,

/* modified,published, uploaded, posted, transmitted, distributed ordisclosed

/* in any way without Intel's prior express written permission.

/* No license under any patent, copyright, trade secret or otherintellectual

/* property right is granted to or conferred upon you by disclosure ordelivery

/* of the Materials, either expressly, by implication, inducement, estoppelor

/* otherwise. Any license under such intellectual property rightsmust be

/* express and approved by Intel in writing.

/*

/******************************************************************************

/* Content:

2855

Code Examples C

/* Intel MKL RCI (P)FGMRES ((Preconditioned) Flexible Generalized Minimal

/* RESidual method)example

/*****************************************************************************/

/*---------------------------------------------------------------------------

/* Example program for solving non-symmetric indefinite system of equations

/* Simplest case: no preconditioning and no user-defined stopping tests

/*---------------------------------------------------------------------------*/

#include <stdio.h>


#include "mkl_spblas.h"


#define N 5

#define size 128

int main(void)

{

2856

/*---------------------------------------------------------------------------

/* Define arrays for the upper triangle of the coefficient matrix

/* Compressed sparse row storage is used for sparse representation

/*---------------------------------------------------------------------------*/

int ia[6]={1,3,6,9,12,14};

int ja[13]={ 1, 3,

1, 2, 4,

2, 3, 5,

3, 4, 5,

4, 5 };

double A[13]={ 1.0, -1.0,

-1.0, 1.0, -1.0,

1.0,-2.0, 1.0,

-1.0, 2.0,-1.0,

-1.0,-3.0 };

/*---------------------------------------------------------------------------

/* Allocate storage for the ?par parameters and the solution/rhs vectors

/*---------------------------------------------------------------------------*/

2857

Code Examples C

int ipar[size];

double dpar[size], tmp[N*(2*N+1)+(N*(N+9))/2+1];

double expected_solution[N]={-1.0,1.0,0.0,1.0,-1.0};

double rhs[N];

double computed_solution[N];

/*---------------------------------------------------------------------------

/* Some additional variables to use with the RCI (P)FGMRES solver

/*---------------------------------------------------------------------------*/

int itercount;

int RCI_request, i, ivar;

double dvar;

char cvar;

printf("--------------------------------------------------\n");

printf("The SIMPLEST example of usage of RCI FGMRES solver\n");

printf("to solve a non-symmetric indefinite non-degenerate\n");

printf(" algebraic system of linear equations\n");

printf("--------------------------------------------------\n\n");

/*---------------------------------------------------------------------------

/* Initialize variables and the right hand side through matrix-vectorproduct

/*---------------------------------------------------------------------------*/

ivar=N;

cvar='N';

mkl_dcsrgemv(&cvar, &ivar, A, ia, ja, expected_solution, rhs);

/*---------------------------------------------------------------------------

if (RCI_request!=0) goto FAILED;

2858


/*---------------------------------------------------------------------------

/* Set the desired parameters:

/* LOGICAL parameters:

/* do residual stopping test

/* do not request for the user defined stopping test

/* do the check of the norm of the next generated vector automatically

/* DOUBLE PRECISION parameters

/* set the relative tolerance to 1.0D-3 instead of default value 1.0D-6

/*---------------------------------------------------------------------------*/

ipar[8]=1;

ipar[9]=0;

ipar[11]=1;

dpar[0]=1.0E-3;

/*---------------------------------------------------------------------------

/* Check the correctness and consistency of the newly set parameters

/*---------------------------------------------------------------------------*/

dfgmres_check(&ivar, computed_solution, rhs, &RCI_request, ipar, dpar, tmp);

if (RCI_request!=0) goto FAILED;

/*---------------------------------------------------------------------------

/* Print the info about the RCI FGMRES method

/*---------------------------------------------------------------------------*/

printf("Some info about the current run of RCI FGMRES method:\n\n");

if (ipar[7])

2859

Code Examples C

{

printf("As ipar[7]=%d, the automatic test for the maximal number ofiterations will be\n", ipar[7]);

printf("performed\n");

}

else

{

printf("As ipar[7]=%d, the automatic test for the maximal number ofiterations will be\n", ipar[7]);

printf("skipped\n");

}

printf("+++\n");

if (ipar[8])

{

printf("As ipar[8]=%d, the automatic residual test will be performed\n",ipar[8]);

}

else

{

printf("As ipar[8]=%d, the automatic residual test will be skipped\n",ipar[8]);

}

printf("+++\n");

if (ipar[9])

2860


{

printf("As ipar[9]=%d, the user-defined stopping test will be requestedvia\n", ipar[9]);

printf("RCI_request=2\n");

}

else

{

printf("As ipar[9]=%d, the user-defined stopping test will not berequested, thus,\n", ipar[9]);

printf("RCI_request will not take the value 2\n");

}

printf("+++\n");

if (ipar[10])

{

printf("As ipar[10]=%d, the Preconditioned FGMRES iterations will beperformed, thus,\n", ipar[10]);

printf("the preconditioner action will be requested viaRCI_request=3\n");

}

else

{

printf("As ipar[10]=%d, the Preconditioned FGMRES iterations will notbe performed,\n", ipar[10]);

printf("thus, RCI_request will not take the value 3\n");

}

printf("+++\n");

if (ipar[11])

2861

Code Examples C

{

printf("As ipar[11]=%d, the automatic test for the norm of the nextgenerated vector is\n", ipar[11]);

printf("not equal to zero up to rounding and computational errors willbe performed,\n");

printf("thus, RCI_request will not take the value 4\n");

}

else

2862


{

printf("As ipar[11]=%d, the automatic test for the norm of the nextgenerated vector is\n", ipar[11]);

printf("not equal to zero up to rounding and computational errors willbe skipped,\n");

printf("thus, the user-defined test will be requested viaRCI_request=4\n");

}

printf("+++\n\n");

/*---------------------------------------------------------------------------

/* Compute the solution by RCI (P)FGMRES solver without preconditioning

/* Reverse Communication starts here

/*---------------------------------------------------------------------------*/

ONE: dfgmres(&ivar, computed_solution, rhs, &RCI_request, ipar, dpar, tmp);

/*---------------------------------------------------------------------------

/* If RCI_request=0, then the solution was found with the requiredprecision

/*---------------------------------------------------------------------------*/

if (RCI_request==0) goto COMPLETE;

/*---------------------------------------------------------------------------

/* If RCI_request=1, then compute the vector A*tmp[ipar[21]-1]

/* and put the result in vector tmp[ipar[22]-1]

/*---------------------------------------------------------------------------

/* NOTE that ipar[21] and ipar[22] contain FORTRAN style addresses,

/* therefore, in C code it is required to subtract 1 from them to get

2863

Code Examples C

/* C style addresses

/*---------------------------------------------------------------------------*/

if (RCI_request==1)

{

mkl_dcsrgemv(&cvar, &ivar, A, ia, ja, &tmp[ipar[21]-1], &tmp[ipar[22]-1]);

goto ONE;

}

/*---------------------------------------------------------------------------

/* If RCI_request=anything else, then dfgmres subroutine failed

/* to compute the solution vector: computed_solution[N]

/*---------------------------------------------------------------------------*/

else

{

goto FAILED;

}

/*---------------------------------------------------------------------------

/* Reverse Communication ends here

/* Get the current iteration number and the FGMRES solution (DO NOT FORGET

/* to call dfgmres_get routine as computed_solution is still containing

2864


/* the initial guess!)

/*---------------------------------------------------------------------------*/

COMPLETE: dfgmres_get(&ivar, computed_solution, rhs, &RCI_request, ipar,dpar, tmp, &itercount);

/*

/*---------------------------------------------------------------------------

/* Print solution vector: computed_solution[N] and the number ofiterations: itercount

/*---------------------------------------------------------------------------*/

printf(" The system has been SUCCESSFULLY solved \n");

printf("\n The following solution has been obtained: \n");

for (i=0;i<N;i++)printf("computed_solution[%d]=%e\n",i,computed_solution[i]);

printf("\n The expected solution is: \n");

for (i=0;i<N;i++)printf("expected_solution[%d]=%e\n",i,expected_solution[i]);

printf("\n Number of iterations: %d",itercount);

goto SUCCEDED;

FAILED: printf("The solver has returned the ERROR code %d", RCI_request);

SUCCEDED: return 0;

}

Fourier Transform Functions Code ExamplesThis section presents code examples of functions described in the “DFT Functions” and “ClusterDFT Functions” sections in the “Fourier Transform Functions” chapter. The examples are groupedin subsections

• Examples for DFT Functions, including Examples of Using Multi-Threading for DFT Computation

• Examples for Cluster DFT Functions.

2865

Code Examples C

DFT Code Examples

This section presents code examples of using the DFT interface functions described in “FourierTransform Functions” chapter. Here are the examples of two one-dimensional computations.These examples use the default settings for all of the configuration parameters, which arespecified in “Configuration Settings”.

Example C-16 One-dimensional In-place DFT (Fortran Interface)! Fortran example.

! 1D complex to complex, and real to conjugate even

Use MKL_DFTI

Complex :: X(32)

Real :: Y(34)

type(DFTI_DESCRIPTOR), POINTER :: My_Desc1_Handle, My_Desc2_Handle

Integer :: Status

...put input data into X(1),...,X(32); Y(1),...,Y(32)

! Perform a complex to complex transform

Status = DftiCreateDescriptor( My_Desc1_Handle, DFTI_SINGLE,

DFTI_COMPLEX, 1, 32 )

Status = DftiCommitDescriptor( My_Desc1_Handle )

Status = DftiComputeForward( My_Desc1_Handle, X )

Status = DftiFreeDescriptor(My_Desc1_Handle)

! result is given by {X(1),X(2),...,X(32)}

2866


! Perform a real to complex conjugate even transform

Status = DftiCreateDescriptor(My_Desc2_Handle, DFTI_SINGLE,

DFTI_REAL, 1, 32)

Status = DftiCommitDescriptor(My_Desc2_Handle)

Status = DftiComputeForward(My_Desc2_Handle, Y)


! result is given in CCS format.

2867

Code Examples C

Example C-16a One-dimensional Out-of-place DFT (Fortran Interface)! Fortran example.


Use MKL_DFTI

Complex :: X_in(32)

Complex :: X_out(32)

Real :: Y_in(32)

Real :: Y_out(34)


Integer :: Status

...put input data into X_in(1),...,X_in(32); Y_in(1),...,Y_in(32)




Status = DftiSetValue( My_Desc1_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiCommitDescriptor( My_Desc1_Handle )

Status = DftiComputeForward( My_Desc1_Handle, X_in, X_out )


! result is given by {X_out(1),X_out(2),...,X_out(32)}


Status = DftiCreateDescriptor(My_Desc2_Handle, DFTI_SINGLE,

DFTI_REAL, 1, 32)

Status = DftiSetValue( My_Desc2_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiCommitDescriptor(My_Desc2_Handle)

Status = DftiComputeForward(My_Desc2_Handle, Y_in, Y_out)


! result is given by Y_out in CCS format.

2868


Example C-17 One-dimensional In-place DFT (C Interface)/* C example, float _Complex is defined in C9X */

#include "mkl_dfti.h"

float _Complex x[32];

float y[34];

dfti_descriptor *my_desc1_handle,*my_desc2_handle;

/* .... or alternatively

dfti_descriptor_handle my_desc1_handle, my_desc2_handle; */

long status;

...put input data into x[0],...,x[31]; y[0],...,y[31]

status = DftiCreateDescriptor( &my_desc1_handle, DFTI_SINGLE,

DFTI_COMPLEX, 1, 32);

status = DftiCommitDescriptor( my_desc1_handle );

status = DftiComputeForward( my_desc1_handle, x);

status = DftiFreeDescriptor(&my_desc1_handle);

/* result is x[0], ..., x[31]*/


DFTI_REAL, 1, 32);

status = DftiCommitDescriptor( my_desc2_handle);

status = DftiComputeForward( my_desc2_handle, y);


/* result is given in CCS format*/

2869

Code Examples C

Example C-17a One-dimensional Out-of-place DFT (C Interface)/* C example, float _Complex is defined in C9X */


float _Complex x_in[32];

float _Complex x_out[32];

float y_in[32];

float y_out[34];

dfti_descriptor *my_desc1_handle,*my_desc2_handle;

/* .... or alternatively

dfti_descriptor_handle my_desc1_handle, my_desc2_handle; */

long status;

...put input data into x_in[0],...,x_in[31]; y_in[0],...,y_in[31]



Status = DftiSetValue( My_Desc1_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);

status = DftiCommitDescriptor( my_desc1_handle );

status = DftiComputeForward( my_desc1_handle, x_in, x_out);


/* result is x_out[0], ..., x_out[31]*/


DFTI_REAL, 1, 32);

Status = DftiSetValue( My_Desc2_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);


status = DftiComputeForward( my_desc2_handle, y_in, y_out);


/* result is given by y_out in CCS format*

2870


The following is an example of two simple two-dimensional transforms. Notice that the dataand result parameters in computation functions are all declared as assumed-size rank-1 arrayDIMENSION(0:*). Therefore two-dimensional array must be transformed to one-dimensionalarray by EQUIVALENCE statement or other facilities of Fortran.

Example C-18 Two-dimensional DFT (Fortran Interface)! Fortran example.


Use MKL_DFTI

Complex :: X_2D(32,100)

Real :: Y_2D(34, 102)

Complex :: X(3200)

Real :: Y(3468)




Integer :: Status, L(2)

...put input data into X_2D(j,k), Y_2D(j,k), 1<=j=32,1<=k<=100

...set L(1) = 32, L(2) = 100

...the transform is a 32-by-100



DFTI_COMPLEX, 2, L)

Status = DftiCommitDescriptor( My_Desc1_Handle)

Status = DftiComputeForward( My_Desc1_Handle, X)


! result is given by X_2D(j,k), 1<=j<=32, 1<=k<=100

2871

Code Examples C

DFTI_REAL, 2, L)

Status = DftiCommitDescriptor( My_Desc2_Handle)

Status = DftiComputeForward( My_Desc2_Handle, Y)


! result is given by the complex value z(j,k) 1<=j<=32; 1<=k<=100

! and is stored in CCS format

2872

Example C-19 Two-dimensional DFT (C Interface)/* C example */


float _Complex x[32][100];

float y[34][102];

dfti_descriptor_handle my_desc1_handle, my_desc2_handle;

/* or alternatively

dfti_descriptor *my_desc1_handle,*my_desc2_handle; */

long status, l[2];

...put input data into x[j][k] 0<=j<=31, 0<=k<=99

...put input data into y[j][k] 0<=j<=31, 0<=k<=99

l[0] = 32; l[1] = 100;


DFTI_COMPLEX, 2, l);


status = DftiComputeForward( my_desc1_handle, x);


/* result is the complex value x[j][k], 0<=j<=31, 0<=k<=99 */


DFTI_REAL, 2, l);


status = DftiComputeForward( my_desc2_handle, y);


/* result is the complex value z(j,k) 0<=j<=31; 0<=k<=99

/* and is stored in CCS format*/

The following examples demonstrate how you can change the default configuration settings byusing the DftiSetValue function.

2873

Code Examples C

For instance, to preserve the input data after the DFT computation, the configuration of theDFTI_PLACEMENT should be changed to "not in place" from the default choice of "in place."

The code below illustrates how this can be done:

Example C-20 Changing Default Settings (Fortran)! Fortran example

! 1D complex to complex, not in place

Use MKL_DFTI

Complex :: X_in(32), X_out(32)

type(DFTI_DESCRIPTOR), POINTER :: My_Desc_Handle

Integer :: Status

...put input data into X_in(j), 1<=j<=32

Status = DftiCreateDescriptor( My_Desc_Handle,DFTI_SINGLE, DFTI_COMPLEX, 1, 32)

Status = DftiSetValue( My_Desc_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiCommitDescriptor( My_Desc_Handle)

Status = DftiComputeForward( My_Desc_Handle, X_in, X_out)

Status = DftiFreeDescriptor (My_Desc_Handle)

! result is X_out(1),X_out(2),...,X_out(32)

2874

Example C-21 Changing Default Settings (C)/* C example */


float _Complex x_in[32], x_out[32];

DFTI_DESCRIPTOR_HANDLE my_desc_handle;

/* or alternatively

DFTI_DESCRIPTOR *my_desc_handle;*/

long status;

...put input data into x_in[j], 0 <= j < 32

status = DftiCreateDescriptor( &my_desc_handle, DFTI_SINGLE,


status = DftiSetValue( my_desc_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);

status = DftiCommitDescriptor( my_desc_handle);

status = DftiComputeForward( my_desc_handle, x_in, x_out);

status = DftiFreeDescriptor(&my_desc_handle);

/* result is x_out[0], x_out[1], ..., x_out[31] */

The Example C-22 below illustrates the use of the status checking functions described in Chapter11 .

Example C-22 Using Status Checking Functionfrom C language:

DFTI_DESCRIPTOR_HANDLE desc;

long status, class_error, value;

char* error_message;

. . . descriptor creation and other code

status = DftiGetValue( desc, DFTI_PRECISION, &value); //

2875

Code Examples C

//or any DFTI function

class_error = DftiErrorClass(status, DFTI_NO_ERROR);

if (! class_error) {

printf ("DftiGetValue() fixes the wrong situation and

returns the corresponding value n");

error_message = DftiErrorMessage(status);

printf("error_message = %s \n", error_message);

}

. . .

from Fortran:

type(DFTI_DESCRIPTOR), POINTER :: desc

integer value, status

character(DFTI_MAX_MESSAGE_LENGTH) error_message

logical class_error

. . . descriptor creation and other code

status = DftiGetValue( desc, DFTI_PRECISION, value)

class_error = DftiErrorClass(status, DFTI_NO_ERROR)

if (.not. class_error) then

print *, ' DftiGetValue() fixes the wrong situation and

returns the corresponding value '

error_message = DftiErrorMessage(status)

print *, 'error_message = ', error_message

endif

2876


Below is an example where a 20-by-40 two-dimensional DFT is computed explicitly usingone-dimensional transforms. Notice that the data and result parameters in computation functionsare all declared as assumed-size rank-1 array DIMENSION(0:*). Therefore two-dimensionalarray must be transformed to one-dimensional array by EQUIVALENCE statement or otherfacilities of Fortran.

2877

Code Examples C

Example C-23 Computing 2D DFT by One-Dimensional Transforms

! Fortran

Complex :: X_2D(20,40),

Complex :: X(800)


INTEGER :: STRIDE(2)

type(DFTI_DESCRIPTOR), POINTER

:: Desc_Handle_Dim1

type(DFTI_DESCRIPTOR), POINTER

:: Desc_Handle_Dim2

...

Status = DftiCreateDescriptor(Desc_Handle_Dim1, DFTI_SINGLE,


Status = DftiCreateDescriptor(Desc_Handle_Dim2, DFTI_SINGLE,


! perform 40 one-dimensional transforms along 1st dimension

Status = DftiSetValue(

Desc_Handle_Dim1, DFTI_NUMBER_OF_TRANSFORMS, 40 )

Status = DftiSetValue( Desc_Handle_Dim1,

DFTI_INPUT_DISTANCE, 20 )

Status = DftiSetValue( Desc_Handle_Dim1,

DFTI_OUTPUT_DISTANCE, 20 )

Status = DftiCommitDescriptor( Desc_Handle_Dim1 )

Status = DftiComputeForward( Desc_Handle_Dim1, X )

! perform 20 one-dimensional transforms along 2nd dimension

Stride(1) = 0; Stride(2) = 20

Status = DftiSetValue( Desc_Handle_Dim2, DFTI_NUMBER_OF_TRANSFORMS, 20 )

2878


Status = DftiSetValue( Desc_Handle_Dim2, DFTI_INPUT_DISTANCE, 1 )

Status = DftiSetValue( Desc_Handle_Dim2, DFTI_OUTPUT_DISTANCE, 1 )

Status = DftiSetValue( Desc_Handle_Dim2, DFTI_INPUT_STRIDES, Stride )

Status = DftiSetValue( Desc_Handle_Dim2, DFTI_OUTPUT_STRIDES, Stride )

Status = DftiCommitDescriptor( Desc_Handle_Dim2 )

Status = DftiComputeForward( Desc_Handle_Dim2, X )

Status = DftiFreeDescriptor( Desc_Handle_Dim1 )

Status = DftiFreeDescriptor( Desc_Handle_Dim2 )

2879

Code Examples C

/* C */


long stride[2];

long status;

DFTI_DESCRIPTOR_HANDLE desc_handle_dim1;

DFTI_DESCRIPTOR_HANDLE desc_handle_dim2;

...

status = DftiCreateDescriptor( &desc_handle_dim1, DFTI_SINGLE,

DFTI_COMPLEX, 1, 20 );

status = DftiCreateDescriptor( &desc_handle_dim2, DFTI_SINGLE,

DFTI_COMPLEX, 1, 40 );

/* perform 40 one-dimensional transforms along 1st dimension */

/* note that the 1st dimension data are not unit-stride */

stride[0] = 0; stride[1] = 40;

status = DftiSetValue( desc_handle_dim1, DFTI_NUMBER_OF_TRANSFORMS, 40 );

status = DftiSetValue( desc_handle_dim1, DFTI_INPUT_DISTANCE, 1 );

status = DftiSetValue( desc_handle_dim1, DFTI_OUTPUT_DISTANCE, 1 );

status = DftiSetValue( desc_handle_dim1, DFTI_INPUT_STRIDES, stride );

status = DftiSetValue( desc_handle_dim1, DFTI_OUTPUT_STRIDES, stride );

status = DftiCommitDescriptor( desc_handle_dim1 );

status = DftiComputeForward( desc_handle_dim1, x );

/* perform 20 one-dimensional transforms along 2nd dimension */

/* note that the 2nd dimension is unit stride */

status = DftiSetValue( desc_handle_dim2, DFTI_NUMBER_OF_TRANSFORMS, 20 );

status = DftiSetValue( desc_handle_dim2,

DFTI_INPUT_DISTANCE, 40 );

status = DftiSetValue( desc_handle_dim2,

2880


DFTI_OUTPUT_DISTANCE, 40 );

status = DftiCommitDescriptor( desc_handle_dim2 );

status = DftiComputeForward( desc_handle_dim2, x );

status = DftiFreeDescriptor( &Desc_Handle_Dim1 );

status = DftiFreeDescriptor( &Desc_Handle_Dim2 );

The following are examples of real multi-dimensional transforms with CCE format storage ofconjugate-even complex matrix. Example C-24 is two-dimensional in-place transform andExample C-24a is two-dimensional out-of-place transform in Fortran interface. Example C-25is three-dimensional out-of-place transform in C interface. Notice that the data and resultparameters in computation functions are all declared as assumed-size rank-1 arrayDIMENSION(0:*). Therefore two-dimensional array must be transformed to one-dimensionalarray by EQUIVALENCE statement or other facilities of Fortran.

2881

Code Examples C

Example C-24 Two-Dimensional REAL In-place DFT (Fortran Interface)! Fortran example.

! 2D and real to conjugate even

Use MKL_DFTI

Real :: X_2D(34,100) ! 34 = (32/2 + 1)*2

Real :: X(3400)




Integer :: strides_in(3)

Integer :: strides_out(3)

...put input data into X_2D(j,k), 1<=j=32,1<=k<=100

...set L(1) = 32, L(2) = 100

...set strides_in(1) = 0, strides_in(2) = 1, strides_in(3) = 34

...set strides_out(1) = 0, strides_out(2) = 1, strides_out(3) = 17



Status = DftiCreateDescriptor( My_Desc_Handle, DFTI_SINGLE,

DFTI_REAL, 2, L )

Status = DftiSetValue(My_Desc_Handle, DFTI_CONJUGATE_EVEN_STORAGE,

DFTI_COMPLEX_COMPLEX)

Status = DftiSetValue(My_Desc_Handle, DFTI_INPUT_STRIDES, strides_in)

Status = DftiSetValue(My_Desc_Handle, DFTI_OUTPUT_STRIDES, strides_out)

Status = DftiCommitDescriptor( My_Desc_Handle)

Status = DftiComputeForward( My_Desc_Handle, X )

Status = DftiFreeDescriptor(My_Desc_Handle)

! result is given by the complex value z(j,k) 1<=j<=17; 1<=k<=100 and

! is stored in real matrix X_2D in CCE format.

2882

Example C-24a Two-Dimensional REAL Out-of-place DFT (FortranInterface)

! Fortran example.

! 2D and real to conjugate even

Use MKL_DFTI

Real :: X_2D(32,100)

Complex :: Y_2D(17, 100) ! 17 = 32/2 + 1

2883

Code Examples C

Real :: X(3200)

Complex :: Y(1700)





Integer :: strides_out(3)

...put input data into X_2D(j,k), 1<=j=32,1<=k<=100

...set L(1) = 32, L(2) = 100

...set strides_out(1) = 0, strides_out(2) = 1, strides_out(3) = 17



Status = DftiCreateDescriptor( My_Desc_Handle, DFTI_SINGLE,

DFTI_REAL, 2, L )

Status = DftiSetValue(My_Desc_Handle,

DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)

Status = DftiSetValue( My_Desc_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE )

Status = DftiSetValue(My_Desc_Handle,

DFTI_OUTPUT_STRIDES, strides_out)

Status = DftiCommitDescriptor(My_Desc_Handle)

Status = DftiComputeForward(My_Desc_Handle, X, Y)

Status = DftiFreeDescriptor(My_Desc_Handle)

! result is given by the complex value z(j,k) 1<=j<=17; 1<=k<=100 and

! is stored in complex matrix Y_2D in CCE format.

2884

Example C-25 Three-Dimensional REAL DFT (C Interface)

/* C example */


float x[32][100][19];

float _Complex y[32][100][10]; /* 10 = 19/2 + 1 */

DFTI_DESCRIPTOR_HANDLE my_desc_handle

/* or alternatively

DFTI_DESCRIPTOR *my_desc_handle*/

long status, l[3];

long strides_out[4];

...put input data into x[j][k][s] 0<=j<=31, 0<=k<=99, 0<=s<=18

2885

Code Examples C

l[0] = 32; l[1] = 100; l[2] =

19;

strides_out[0] = 0; strides_out[1] = 1000;

strides_out[2] = 10; strides_out[3] = 1;

status = DftiCreateDescriptor( &my_desc_handle, DFTI_SINGLE,

DFTI_REAL, 3, l );

Status = DftiSetValue(my_desc_handle,

DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);

Status = DftiSetValue( my_desc_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE );

Status = DftiSetValue(my_desc_handle,

DFTI_OUTPUT_STRIDES, strides_out);

status = DftiCommitDescriptor(my_desc_handle);

status = DftiComputeForward(my_desc_handle, x, y);

status = DftiFreeDescriptor(&my_desc_handle);

/* result is the complex value z(j,k,s) 0<=j<=31; 0<=k<=99, 0<=s<=9

and is stored in complex matrix y in CCE format. */

Examples of Using Multi-Threading for DFT Computation

The following example program shows how to employ internal threading in Intel MKL for DFTcomputation (see case 1 in “Number of user threads”).

To specify the number of threads inside Intel MKL, use the following settings:

set OMP_NUM_THREADS = 1 for one-threaded mode;

set OMP_NUM_THREADS = 4 for multi-threaded mode.

Note that the configuration parameter DFTI_NUMBER_OF_USER_THREADS must be equal to itsdefault value 1.

2886

Example C-26 Using Intel MKL Internal Threading Mode


void main () {

float x[200][100];

DFTI_DESCRIPTOR_HANDLE my_desc1_handle;

long status, len[2];

//...put input data into x[j][k] 0<=j<=199, 0<=k<=99

len[0] = 200; len[1] = 100;

status = DftiCreateDescriptor( &my_desc1_handle, DFTI_SINGLE,DFTI_REAL, 2,

len);

status = DftiCommitDescriptor(

my_desc1_handle);

status = DftiComputeForward(

my_desc1_handle, x);


}

The following Example C-27 illustrates a parallel customer program with each descriptor instanceused only in a single thread (see case 2 in “Number of user threads”).

To specify the number of threads, use the following settings:

set MKL_SERIAL = yes (or YES) for single-threaded mode in Intel MKL (recommended);

set OMP_NUM_THREADS = 4 for multi-threaded mode in customer program.

The configuration parameter DFTI_NUMBER_OF_USER_THREADS must be equal to its defaultvalue 1.

Note that in this example the program can be transformed to become single-threaded on thecustomer level but using parallel mode within Intel MKL. To achieve this, you need to set theparameter DFTI_NUMBER_OF_TRANSFORMS = 4 and to set the corresponding parameterDFTI_INPUT_DISTANCE = 5000.

2887

Code Examples C

Example C-27 Using Parallel Mode with Multiple Descriptors


#include "omp.h"

void main ()

{


long len[2];


len[0] = 50; len[1] = 100;

// each thread calculates real DFT for matrix (50*100)

#pragma omp parallel

{

DFTI_DESCRIPTOR_HANDLE my_desc_handle;

long myStatus;

int myID = omp_get_thread_num ();

myStatus = DftiCreateDescriptor (my_desc_handle, DFTI_SINGLE,

DFTI_COMPLEX, 2, len);

myStatus = DftiCommitDescriptor (my_desc_handle);

myStatus = DftiComputeForward (my_desc_handle, &x[myID * len[0] *len[1]);

myStatus = DftiFreeDescriptor (&my_desc_handle);

} /* End OpenMP parallel region */

}

The following Example C-28 illustrates a parallel customer program with a common descriptorused in several threads (see case 3 in “Number of user threads”.

2888


In this case the number of threads, as well as any other configuration parameter, must not bechanged after DFT initialization by the DftiCommitDescriptor() function is done.

Example C-28 Using Parallel Mode with a Common Descriptor

// set number of threads inside Intel MKL:

//rem set MKL_SERIAL = YES - is not required

// since one-threaded mode for Intel MKL is forced automatically

2889

Code Examples C

// set OMP_NUM_THREADS = 4 - multi-threaded mode for customer


#include "omp.h"

void main ()

{


long status;

DFTI_DESCRIPTOR_HANDLE desc_handle;

int nThread = omp_get_max_threads ();

long len[2];


len[0] = 50; len[1] = 100;

status =

DftiCreateDescriptor (desc_handle, DFTI_SINGLE, DFTI_COMPLEX, 2, len);

status = DftiSetValue (desc_handle,

DFTI_NUMBER_OF_USER_THREADS, nThread);

status = DftiCommitDescriptor (desc_handle);

// each thread calculates real DFT for matrix (50*100)

#pragma omp parallel num_threads(nThread)

{

long myStatus;

int myID = omp_get_thread_num ();

myStatus = DftiComputeForward (desc_handle, &x[myID * len[0] * len[1]);

} /* End OpenMP parallel region */

2890


status = DftiFreeDescriptor (&desc_handle);

}

Examples for Cluster DFT Functions

The C example below computes 2-dimensional out-of-place FFT using Cluster DFT:

Example 29 2D Out-of-place Cluster DFT ComputationDFTI_DESCRIPTOR_DM_HANDLE desc;

long len[2],v,i,j,n,s;

Complex *in,*out;

MPI_Init(...);

// Create descriptor for 2D FFT

len[0]=nx;

len[1]=ny;

DftiCreateDescriptorDM(MPI_COMM_WORLD,&desc,DFTI_DOUBLE,DFTI_COMPLEX,2,len);

// Ask necessary length of in and out arrays and allocate memory

DftiGetValueDM(desc,CDFT_LOCAL_SIZE,&v);

in=(Complex*)malloc(v*sizeof(Complex));

out=(Complex*)malloc(v*sizeof(Complex));

// Fill local array with initial data. Current process performs n rows,

// 0 row of in corresponds to s row of virtual global array

DftiGetValueDM(desc,CDFT_LOCAL_NX,&n);

DftiGetValueDM(desc,CDFT_LOCAL_X_START,&s);

// Virtual global array globalIN is defined by function f as

2891

Code Examples C

// globalIN[i*ny+j]=f(i,j)

for(i=0;i<n;i++)

for(j=0;j<ny;j++) in[i*ny+j]=f(i+s,j);

// Set that we want out-of-place transform (default is DFTI_INPLACE)

DftiSetValueDM(desc,DFTI_PLACEMENT,DFTI_NOT_INPLACE);

// Commit descriptor, calculate FFT, free descriptor

DftiCommitDescriptorDM(desc);

DftiComputeForwardDM(desc,in,out);

// Virtual global array globalOUT is defined by function g as

// globalOUT[i*ny+j]=g(i,j)

// Now out contains result of FFT. out[i*ny+j]=g(i+s,j)

DftiFreeDescriptorDM(&desc);

free(in);

free(out);

MPI_Finalize();

The C example below illustrates one-dimensional in-place cluster DFT computations effectedwith a user-defined workspace:

2892

Example 30 1D In-place Cluster DFT ComputationsDFTI_DESCRIPTOR_DM_HANDLE desc;

long len,v,i,n_out,s_out;

Complex *in,*work;

MPI_Init(...);

// Create descriptor for 1D FFT

DftiCreateDescriptorDM(MPI_COMM_WORLD,&desc,DFTI_DOUBLE,DFTI_COMPLEX,1,len);

// Ask necessary length of array and workspace and allocate memory

DftiGetValueDM(desc,CDFT_LOCAL_SIZE,&v);

in=(Complex*)malloc(v*sizeof(Complex));

work=(Complex*)malloc(v*sizeof(Complex));

// Fill local array with initial data. Local array has n elements,

// 0 element of in corresponds to s element of virtual global array

DftiGetValueDM(desc,CDFT_LOCAL_NX,&n);

DftiGetValueDM(desc,CDFT_LOCAL_X_START,&s);

// Set work array as a workspace

DftiSetValueDM(desc,CDFT_WORKSPACE,work);

// Virtual global array globalIN is defined by function f as globalIN[i]=f(i)

for(i=0;i<n;i++) in[i]=f(i+s);

// Commit descriptor, calculate FFT, free descriptor

DftiCommitDescriptorDM(desc);

DftiComputeForwardDM(desc,in);

DftiGetValueDM(desc,CDFT_LOCAL_OUT_NX,&n_out);

DftiGetValueDM(desc,CDFT_LOCAL_OUT_X_START,&s_out);

// Virtual global array globalOUT is defined by function g asglobalOUT[i]=g(i)

// Now in contains result of FFT. Local array has n_out elements,

2893

Code Examples C

// 0 element of in corresponds to s_out element of virtual global array.

// in[i]==g(i+s_out)

DftiFreeDescriptorDM(&desc);

free(in);

free(work);

MPI_Finalize();

Interval Linear Solvers Code ExamplesThis section presents code examples of using the routines described in “Interval Linear Solvers”.These routines are intended for computing enclosures and estimates of the solution sets tointerval linear systems of equations as well as for checking properties of interval matrices andtheir inversion.

Example C-27. Interval Gauss-Seidel method

Given an interval system of linear algebraic equations, interval Gauss-Seidel method(implemented as ?gegss routine) is often applied for enclosing a desired portion of the solutionset that is bounded by a prescribed interval box.

Consider the following interval linear system of equations

proposed first by E. Hansen (see Hansen92). Does its solution set intersect the interval box

2894


The following sample program answers the above question.

PROGRAM DIGEGSS_EXAMPLE

!

! Example program enclosing the solution set to a square interval

! linear system by interval Gauss-Seidel iterative method

!

!----------------------------------------------------------------------!


IMPLICIT NONE

!----------------------------------------------------------------------!

INTEGER, PARAMETER :: DIM = 2

INTEGER :: NRHS, LDA, LDB, NITS,

INFO, I, J

REAL(8) :: EPSILON

TYPE(D_INTERVAL) :: A(DIM,DIM), B(DIM,1),

ENCL(DIM,1)

CHARACTER(1) :: TRANS

!----------------------------------------------------------------------!

PRINT 300

!----------------------------------------------------------------------!

! !

! Initializing the input data - !

! !

TRANS = 'N'

NRHS = 2

A(1,1) = DINTERVAL(2.,3.); A(1,2) = DINTERVAL(0.,1.);

A(2,1) = DINTERVAL(0.,1.); A(2,2) = DINTERVAL(2.,3.);

LDA = 2

B(1,1) = DINTERVAL(0.,120.); B(2,1) = DINTERVAL(60.,240.);

2895

Code Examples C

LDB = 2

EPSILON = 1.D-6

NITS = 20

!----------------------------------------------------------------------!

!

! Assigning the bounding box for the solution set -

DO I = 1, DIM

ENCL(I,1) = DINTERVAL(0.,200.)

END DO

!----------------------------------------------------------------------!

CALL DIGEGSS(TRANS,

DIM, NRHS, A, LDA, B, LDB, ENCL, EPSILON, NITS, INFO )

!----------------------------------------------------------------------!

!

! Outputting the solution

IF( INFO /= 0 ) THEN

PRINT 400

ELSE

PRINT 600

DO I = 1, DIM

PRINT *, '[', B(I,1), ']'

END DO

END IF

!----------------------------------------------------------------------!

300 FORMAT (/,' **** SOLVING INTERVAL LINEAR SYSTEM **** ',/, &

' by interval Gauss-Seidel method ')

400 FORMAT (/,' The interval Gauss-Seidel method fails. ')

600 FORMAT (/,' Outer interval estimate of the solution set:',/)

2896


!----------------------------------------------------------------------!

END PROGRAM DIGEGSS_EXAMPLE

Assigning double-precision intervals to the entries of the matrix A and right-hand side vectorB is carried out by DINTERVAL function that turns two real numbers into the interval havingthese reals as endpoints. Running the above code produces the answer

**** SOLVING INTERVAL LINEAR SYSTEM****

by interval Gauss-Seidel method

Outer interval estimate of the solution set:

[ 0.000000000000000E+000 60.0000000000000 ]

[ 0.000000000000000E+000 120.000000000000 ]

One can make sure that the resulting box really encloses the required portion of the solutionset after having a look at the corresponding graph from the paper Hansen92. Moreover, it iseven the tightest possible enclosure.

2897

Code Examples C

Example C-28. Hansen-Bliek-Rohn procedure

The following Fortran-90 program illustrates the use of digehbs routine implementing“semiinterval” Hansen-Bliek-Rohn procedure for outer interval estimation of the solution setsto interval linear systems.

PROGRAM DIGEHBS_EXAMPLE

!

! Example program for enclosing the solution set to square interval

! interval system of equations by Hansen-Bliek-Rohn procedure

!

!----------------------------------------------------------------------!


IMPLICIT NONE

!----------------------------------------------------------------------!

INTEGER, PARAMETER :: DIM = 2

INTEGER :: LDA, LDB, INFO, I, J

TYPE(D_INTERVAL), ALLOCATABLE ::

A(:,:), B(:)


!----------------------------------------------------------------------!

PRINT 300

!----------------------------------------------------------------------!

!

! Initializing the input data -

!

TRANS = 'N'

ALLOCATE( A(DIM,DIM), B(DIM) )



LDA = 2

2898


B(1) = DINTERVAL(0.,2.); B(2) = DINTERVAL(0.,2.)

LDB = 2

!----------------------------------------------------------------------!

CALL DIGEHBS( TRANS, DIM, A, LDA,

B, LDB, INFO )

!----------------------------------------------------------------------!


PRINT 400

ELSE

PRINT 600

DO I = 1, DIM

PRINT *, I, ') [', B(I), ']'

END DO

END IF

!----------------------------------------------------------------------!

DEALLOCATE( A, B )

!----------------------------------------------------------------------!

300 FORMAT (/,' **** SOLVING INTERVAL LINEAR SYSTEM ****',/, &

' by Hansen-Bliek-Rohn procedure ',/)

400 FORMAT (/,' The matrix of the system is not an H-matrix,',/, &

' Hansen-Bliek-Rohn procedure fails. ',/)

600 FORMAT (/,' Enclosure of the solution set: ',/)

!----------------------------------------------------------------------!

END PROGRAM DIGEHBS_EXAMPLE

However, the output of the program looks like


by Hansen-Bliek-Rohn procedure

The matrix of the system is not an H-matrix,

Hansen-Bliek-Rohn procedure fails.

2899

Code Examples C

This result is because the program is applied to the interval linear system

where the interval matrix is not an H-matrix (that is, it does not have diagonal dominance).

2900


However, preconditioning by digemip routine helps to resolve the problem. The next modifiedprogram, which incorporates preliminary preconditioning of the interval linear system undersolution, makes the matrix diagonally dominant and produces an acceptable answer to theproblem.

PROGRAM DIGEMIP_DIGEHBS_EXAMPLE

!

! Example program for enclosing the solution set to square interval

! interval system of equations by Hansen-Bliek-Rohn procedure

!

!----------------------------------------------------------------------!


IMPLICIT NONE

!----------------------------------------------------------------------!

INTEGER, PARAMETER :: DIM = 2, NRHS = 1

INTEGER :: LDA, LDB, INFO, I, J

TYPE(D_INTERVAL), ALLOCATABLE ::

A(:,:), B(:)


!----------------------------------------------------------------------!

PRINT 300

!----------------------------------------------------------------------!

!

! Initializing the input data -

!

TRANS = 'N'

ALLOCATE( A(DIM,DIM), B(DIM) )



LDA = 2

2901

Code Examples C

B(1) = DINTERVAL(0.,2.); B(2) = DINTERVAL(0.,2.)

LDB = 2

!----------------------------------------------------------------------!

CALL DIGEMIP( DIM, NRHS, A, LDA,

B, LDB, INFO )

CALL DIGEHBS( TRANS, DIM, A, LDA,

B, LDB, INFO )

!----------------------------------------------------------------------!


PRINT 400

ELSE

PRINT 600

DO I = 1, DIM

PRINT *, I, ') [', B(I), ']'

END DO

END IF

DEALLOCATE( A, B )

!----------------------------------------------------------------------!

300 FORMAT (/,' **** SOLVING INTERVAL LINEAR SYSTEM ****',/, &

' by Hansen-Bliek-Rohn procedure ',/)

400 FORMAT (/,' The matrix of the system is not an H-matrix,',/, &

' Hansen-Bliek-Rohn procedure fails. ',/)

600 FORMAT (/,' Enclosure of the solution set: ',/)

!----------------------------------------------------------------------!

END PROGRAM DIGEMIP_DIGEHBS_EXAMPLE

2902


This time, the output of the program is


by Hansen-Bliek-Rohn procedure

Enclosure of the solution set:

1 ) [ -4.23529411764708 10.7058823529412 ]

2 ) [ -6.70588235294119 10.8235294117647 ]

(the last digits may change for various computer architectures).

Example C-29. Computing enclosure for inverse interval matrix

Given an interval 2 × 2 matrix

2903

Code Examples C

the following Fortran-90 code computes an enclosure of its inverse interval matrix:

PROGRAM SIGESZI_EXAMPLE

!

!Example program inverting an interval matrix by Sczulz iterative procedure

!

!----------------------------------------------------------------------!


IMPLICIT NONE

!----------------------------------------------------------------------!

INTEGER, PARAMETER :: DIM = 2, LDA = 2

INTEGER :: INFO, I, J

TYPE(S_INTERVAL), ALLOCATABLE ::

A(:,:)

!----------------------------------------------------------------------!

PRINT 300

!----------------------------------------------------------------------!

!

! Initislizing the input data -

!

ALLOCATE( A(LDA,DIM) )

A(1,1) = SINTERVAL(3.,3.); A(1,2) = SINTERVAL(0.,1.)

A(2,1) = SINTERVAL(1.,2.); A(2,2) = SINTERVAL(2.,3.)

!----------------------------------------------------------------------!

CALL SIGESZI ( DIM, A, LDA, INFO )

!----------------------------------------------------------------------!

PRINT 600

DO I = 1, DIM

PRINT *, ('[', A(I,J), ']', J = 1,

DIM )

2904


END DO

DEALLOCATE( A )

!----------------------------------------------------------------------!

300 FORMAT (/,' **** INVERTING INTERVAL MATRIX ****',/, &

' by interval Schulz method ' )

400 FORMAT (/,' Schulz inversion procedure failed. ',/)

600 FORMAT (/,' Enclosure of the inverse matrix ',/)

!----------------------------------------------------------------------!

END PROGRAM SIGESZI_EXAMPLE

The output listing (with small variations depending on the architecture) looks as follows:

**** INVERTING INTERVAL MATRIX ****

by interval Schulz method

Enclosure of the inverse matrix

[ 0.2407409 0.5000001 ][ -0.2500000 0.1018518 ]

[ -0.5000000 5.5555239E-02 ][ 0.1388889 0.7500001 ]

At the same time, if we widen the (1,1) entry of the matrix to the interval [2, 3], the sigesziprocedure fails to compute a finite enclosure of the inverse to the new interval matrix

Nevertheless, the interval linear system with such matrix can be successfully solved byspecialized routines, for example, by interval Gauss method or interval Gauss-Seidel method(see Example C-27).

PDE Support Code ExamplesThis section presents code examples for routines described in the in the “Partial DifferentialEquations Support” chapter. The examples are grouped in subsections

2905

Code Examples C

• Trigonometric Transform Code Examples

• Poisson Library Code Examples.

Trigonometric Transforms Interface Code Examples

Code presented in this section computes solutions of three simple 1D Helmholtz problems withdifferent boundary conditions: DD, NN and ND cases, where “D” denotes a Dirichlet boundarycondition and “N” stands for a Neumann boundary condition. Example C-34 implements thecomputations in C and Example C-35 provides Fortran-90 code.

The algorithm of computing the solution uses Trigonometric Transform routines, described inchapter 13. In the DD case, the sine transform is computed, the NN case uses the cosinetransform and the ND case corresponds to the staggered cosine transform.

Other details of the Helmholtz problems being solved are printed out along with the computedsolutions.

Upon successful execution of Example C-34 the following text is printed out (Example C-35generates similar output):Example of use of MKL Trigonometric Transforms

**********************************************

This example gives the the solutions of the 1D differential problems

with the equation -u"+u=f(x), 0<x<1,

2906

and with 3 types of boundary conditions:

DD case: u(0)=u(1)=0,

NN case: u'(0)=u'(1)=0,

ND case: u'(0)=u(1)=0.

-----------------------------------------------------------------------

In general, the error should be of order O(1.0/n**2)

For this example, the value of n is 8

The approximation error should be of order 5.0e-002 if everything is OK

-----------------------------------------------------------------------

Note that n should be even to use Trigonometric Transforms !

-----------------------------------------------------------------------

DOUBLE PRECISION COMPUTATIONS

=====================================================================

The computed solution of DD problem is

u[0]= 0.000

u[1]= 0.153

u[2]= 0.524

u[3]= 0.895

u[4]= 1.049

u[5]= 0.895

u[6]= 0.524

u[7]= 0.153

u[8]= 0.000

Error=4.873e-002

The computed solution of NN problem is

2907

Code Examples C

u[0]=-0.026

u[1]= 0.128

u[2]= 0.500

u[3]= 0.872

u[4]= 1.026

u[5]= 0.872

u[6]= 0.500

u[7]= 0.128

u[8]=-0.026

Error=2.583e-002

The computed solution of ND problem is

u[0]=-0.009

u[1]= 0.145

u[2]= 0.517

u[3]= 0.890

u[4]= 1.045

u[5]= 0.892

u[6]= 0.522

u[7]= 0.152

u[8]= 0.000

Error=4.470e-002

C code for the computations is given below:

2908


Example C-34 C Example to Solve a Set of 1D Helmholtz Problems*****************************************************************************

! INTEL CONFIDENTIAL

! Copyright(C) 2005 Intel Corporation. All Rights Reserved.



! or licensors. Title to the Material remains with Intel Corporationor its










! otherwise. Any license under such intellectual property rightsmust be


!

!*****************************************************************************

! Content:

! Double precision C test example for trigonometric transforms

2909

Code Examples C

!*****************************************************************************

!

! This example gives the solution of the 1D differential problems

! with the equation -u"+u=f(x), 0<x<1, and with 3 types of boundaryconditions:

! u(0)=u(1)=0 (DD case), or u'(0)=u'(1)=0 (NN case), or u'(0)=u(1)=0 (NDcase)

*/ #include <stdio.h>

#include <malloc.h>

#include <math.h>


#include "mkl_trig_transforms.h"

int main(void)

{

int n=8, i, k, tt_type;

int ir, ipar[128];

/* Note that the size of the transform n must be even !!! */

double pi=3.14159265358979324, xi, c;

double c1, c2, c3, c4, c5, c6;

double *u, *f, *dpar, *lambda;

DFTI_DESCRIPTOR_HANDLE handle = 0; /* Printing the header for the example*/

printf("\n Example of use of MKL Trigonometric Transforms\n");

printf(" **********************************************\n\n");

printf(" This example gives the the solutions of the 1D differentialproblems\n");

printf(" with the equation -u\"+u=f(x), 0<x<1, \n");

printf(" and with 3 types of boundary conditions:\n");

printf(" DD case: u(0)=u(1)=0,\n");

2910

printf(" NN case: u'(0)=u'(1)=0,\n");

printf(" ND case: u'(0)=u(1)=0.\n");

printf("-----------------------------------------------------------------------\n");

printf(" In general, the error should be of order O(1.0/n**2)\n");

printf(" For this example, the value of n is %1i\n", n);

printf(" The approximation error should be of order 5.0e-002 if everythingis OK\n");

printf("-----------------------------------------------------------------------\n");

printf(" Note that n should be even to use Trigonometric Transforms !\n");

printf("-----------------------------------------------------------------------\n");

printf(" DOUBLE PRECISION COMPUTATIONS\n");

printf("=======================================================================\n\n");

u=(double*)malloc((n+1)*sizeof(double));

f=(double*)malloc((n+1)*sizeof(double));

dpar=(double*)malloc((3*n/2+1)*sizeof(double));

lambda=(double*)malloc((n+1)*sizeof(double));

for(i=0;i<=2;i++)

{

/* Varying the type of the transform */

tt_type=i;

/* Computing test solutions u(x) */

for(k=0;k<=n;k++)

{

xi=1.0E0*k/n;

u[k]=pow(sin(pi*xi),2.0E0);

}

2911

Code Examples C

/* Computing the right-hand side f(x) */

for(k=0;k<=n;k++)

{

f[k]=(4.0E0*(pi*pi)+1.0E0)*u[k]-2.0E0*(pi*pi);

}

/* Computing the right-hand side for the algebraic system */

for(k=0;k<=n;k++)

{

f[k]=f[k]/(n*n);

}

2912

if (tt_type==0)

{

/* The Dirichlet boundary conditions */

f[0]=0.0E0;

f[n]=0.0E0;

}

if (tt_type==2)

{

/* The mixed Neumann-Dirichlet boundary conditions */

f[n]=0.0E0;

}

/* Computing the eigenvalues for the three-point finite-differenceproblem */

if (tt_type==0||tt_type==1)

{

for(k=0;k<=n;k++)

{

lambda[k]=pow(2.0E0*sin(0.5E0*pi*k/n),2.0E0)+1.0E0/(n*n);

}

}

if (tt_type==2)

{

for(k=0;k<=n;k++)

{</

lambda[k]=pow(2.0E0*sin(0.25E0*pi*(2*k+1)/n),2.0E0)+1.0E0/(n*n);

}

}

2913

Code Examples C

/* Computing the solution of 1D problem using trigonometric transforms

First we initialize the transform */

d_init_trig_transform(&n,&tt_type,ipar,dpar,&ir);

if (ir!=0) goto FAILURE;

/* Then we commit the transform. Note that the data in f will be changedat this stage !

If you want to keep them, save them in some other array before thecall to the routine */

d_commit_trig_transform(f,&handle,ipar,dpar,&ir);


/* Now we can apply trigonometric transform */

d_forward_trig_transform(f,&handle,ipar,dpar,&ir);


2914


/* Scaling the solution by the eigenvalues */

for(k=0;k<=n;k++)

{

f[k]=f[k]/lambda[k];

}

/* Now we can apply trigonometric transform once again as ONLY

input vector f has changed */

d_backward_trig_transform(f,&handle,ipar,dpar,&ir);


/* Cleaning the memory used by handle

Now we can use handle for other kind of trigonometric transform */

free_trig_transform(&handle,ipar,&ir);


/* Performing the error analysis */

c1=0.0E0;

c2=0.0E0;

c3=0.0E0;

for(k=0;k<=n;k++)

{

/* Computing the absolute value of the exact solution */

c4=fabs(u[k]);

/* Computing the absolute value of the computed solution

Note that the solution is now in place of the former right-handside ! */

c5=fabs(f[k]);

/* Computing the absolute error */

c6=fabs(f[k]-u[k]);

/* Computing the maximum among the above 3 values c4-c6 */

if (c4>c1) c1=c4;

2915

Code Examples C

if (c5>c2) c2=c5;

if (c6>c3) c3=c6;

}

/* Printing the results */

if (tt_type==0)

{

printf("The computed solution of DD problem is\n\n");

for(k=0;k<=n;k++)

{

printf("u[%1i]=%6.3f\n",k,f[k]);

}

printf("\nError=%6.3e\n\n",c3/c1);

}

if (tt_type==1)

{

printf("The computed solution of NN problem is\n\n");

for(k=0;k<=n;k++)

{


}


}

if (tt_type==2)

2916

{

printf("The computed solution of ND problem is\n\n");

for(k=0;k<=n;k++)

{


}


}

/* End of the loop over the different kind of transforms and problems*/

}

/* Jumping over failure message */

goto SUCCESS;

/* Failure message to print if something went wrong */

FAILURE: printf("Failed to compute the solution(s)...");

SUCCESS: return 0;

/* End of the example code */

}

Fortran code for the computations is given below:

2917

Code Examples C

Example C-35 Fortran Example to Solve a Set of 1D Helmholtz Problems!*****************************************************************************

















!

!

!*****************************************************************************

! Content:

2918


! Double precision Fortran90 test example for trigonometric transforms

!*****************************************************************************

! This example gives the solution of the 1D differential problems

! with the equation -u"+u=f(x), 0<x<1, and with 3 types of boundaryconditions:

! u(0)=u(1)=0 (DD case), or u'(0)=u'(1)=0 (NN case), or u'(0)=u(1)=0 (NDcase)

program d_tt_example_bvp

use mkl_dfti

use mkl_trig_transforms

implicit none

2919

Code Examples C

integer n, i, k,j, tt_type

integer ir, ipar(128)

! Note that the size of the transform n must be even !!!

parameter (n=8)

double precision pi, xi

double precision c1, c2, c3, c4, c5, c6

double precision u(n+1), f(n+1), dpar(3*n/2+1), lambda(n+1)

parameter (pi=3.14159265358979324D0)

type(dfti_descriptor), pointer :: handle

! Printing the header for the example

print *, ''

print *, ' Example of use of MKL Trigonometric Transforms'

print *, ' **********************************************'

print *, ''

print *, ' This example gives the solution of the 1D differentialproblems'

print *, ' with the equation -u"+u=f(x), 0<x<1, '

print *, ' and with 3 types of boundary conditions:'

print *, ' DD case: u(0)=u(1)=0,'

print *, ' NN case: u''(0)=u''(1)=0,'

print *, ' ND case: u''(0)=u(1)=0.'

print *, '-----------------------------------------------------------------------'

print *, ' In general, the error should be of order O(1.0/n**2)'

print *, ' For this example, the value of n is', n

print *, ' The approximation error should be of order 0.5E-01, ifeverything is OK'

print *, '-----------------------------------------------------------------------'

print *, ' Note that n should be even to use Trigonometric Transforms!'

2920

print *, '-----------------------------------------------------------------------'

print *, ' DOUBLE PRECISION COMPUTATIONS'

print*,'======================================================================='

print *, ''

do i=0,2

! Varying the type of the transform

tt_type=i

! Computing test solution u(x)

do k=1,n+1

xi=1.0D0*(k-1)/n

u(k)=dsin(pi*xi)**2

end do

! Computing the right-hand side f(x)

do k=1,n+1

f(k)=(4.0D0*(pi**2)+1.0D0)*u(k)-2.0D0*(pi**2)

end do

! Computing the right-hand side for the algebraic system

do k=1,n+1

f(k)=f(k)/(n**2)

end do

2921

Code Examples C

if (tt_type.eq.0) then

! The Dirichlet boundary conditions

f(1)=0.0D0

f(n+1)=0.0D0

end if


! The mixed Neumann-Dirichlet boundary conditions

f(n+1)=0.0D0

end if

! Computing the eigenvalues for the three-point finite-difference problem

if (tt_type.eq.0.or.tt_type.eq.1) then

do k=1,n+1

lambda(k)=(2.0D0*dsin(0.5D0*pi*(k-1)/n))**2+1.0D0/(n**2)

end do

end if


do k=1,n+1

lambda(k)=(2.0D0*dsin(0.25D0*pi*(2*k-1)/n))**2+1.0D0/(n**2)

end do

end if

! Computing the solution of 1D problem using trigonometric transforms

! First we initialize the transform

2922


CALL D_INIT_TRIG_TRANSFORM(n,tt_type,ipar,dpar,ir)

if (ir.ne.0) goto 99

! Then we commit the transform. Note that the data in f will be changed atthis stage !

! If you want to keep them, save them in some other array before the callto the routine

CALL D_COMMIT_TRIG_TRANSFORM(f,handle,ipar,dpar,ir)


! Now we can apply trigonometric transform

CALL D_FORWARD_TRIG_TRANSFORM(f,handle,ipar,dpar,ir)


! Scaling the solution by the eigenvalues

do k=1,n+1

f(k)=f(k)/lambda(k)

end do

! Now we can apply trigonometric transform once again as ONLY input vectorf has changed

CALL D_BACKWARD_TRIG_TRANSFORM(f,handle,ipar,dpar,ir)


! Cleaning the memory used by handle

! Now we can use handle for other KIND of trigonometric transform

CALL FREE_TRIG_TRANSFORM(handle,ipar,ir)


2923

Code Examples C

! Performing the error analysis

c1=0.0D0

c2=0.0D0

c3=0.0D0

do k=1,n+1

! Computing the absolute value of the exact solution

c4=dabs(u(k))

! Computing the absolute value of the computed solution

! Note that the solution is now in place of the former right-hand side !

c5=dabs(f(k))

! Computing the absolute error

c6=dabs(f(k)-u(k))

! Computing the maximum among the above 3 values c4-c6

if (c4.gt.c1) c1=c4

if (c5.gt.c2) c2=c5

if (c6.gt.c3) c3=c6

end do

! Printing the results


print *, 'The computed solution of DD problem is'

print *, ''

do k=1,n+1

write(*,11) k,f(k)

end do

print *, ''

write(*,12) c3/c1

print *, ''

end if

2924



print *, 'The computed solution of NN problem is'

print *, ''

do k=1,n+1

write(*,11) k,f(k)

end do

print *, ''

write(*,12) c3/c1

print *, ''

end if


print *, 'The computed solution of ND problem is'

print *, ''

do k=1,n+1

write(*,11) k,f(k)

end do

print *, ''

write(*,12) c3/c1

print *, ''

end if

! End of the loop over the different kind of transforms and problems

end do

2925

Code Examples C

! Jumping over failure message

go to 1

! Failure message to print if something went wrong

99 continue

print *, 'Failed to compute the solution(s)...'

1 continue

! Print formats

11 format(1x,'u(',I1,')=',F6.3)

12 format(1x,'Relative error =',E10.3)

! End of the example code

end

Poisson Library Code Examples

Cartesian case

The code below computes an approximate solution of a 2D Poisson problem

in the rectangle 0<x<1, 0 <y<1.

The following boundary conditions are imposed for the problem:


for 0≤y≤1.


2926

for 0<x<1.

The exact solution is known to be

Example C-36 implements the computations in C and Example C-37 provides Fortran-90 code.Mind that PL interface cannot be invoked from Fortran-77 due to restrictions imposed by theuse of Intel MKL DFT interface.

The algorithm of computing the approximate solution of a Poisson Problem uses Poisson Libraryroutines, described in chapter 13. Details of the Poisson problem being solved and the errorsare printed out between the computed solution and the exact one.

Upon successful execution of Example C-36 the following text is printed out (Example C-37produces similar output):

Example of use of MKL Poisson Library

**********************************************

2927

Code Examples C

This example gives the solution of 2D Poisson problem

with the equation -u_xx-u_yy=f(x,y), 0<x<1, 0<y<1,

f(x,y)=(8*pi*pi)*sin(2*pi*x)*sin(2*pi*y),

and with the following boundary conditions:

u(0,y)=u(1,y)=1 (Dirichlet boundary conditions),

-u_y(x,0)=-2.0*pi*sin(2*pi*x) (Neumann boundary condition),

u_y(x,1)= 2.0*pi*sin(2*pi*x) (Neumann boundary condition).

-----------------------------------------------------------------------

In general, the error should be of order O(1.0/nx^2+1.0/ny^2)

For this example, the value of nx=ny is 6

The approximation error should be of order 1.0e-01, if everything is OK

-----------------------------------------------------------------------

Note that nx should be even to use Poisson Library !

-----------------------------------------------------------------------


=======================================================================

The number of mesh intervals in x-direction is nx=6

The number of mesh intervals in y-direction is ny=6

2928

In the mesh point (0.167,0.000) the error between the computed and thetrue solution is equal to -7.505e-02

In the mesh point (0.167,0.167) the error between the computed and thetrue solution is equal to 4.432e-02






Double precision 2D Poisson example has successfully PASSED

through all steps of computation!

Note that actual figures in the error may slightly differ from the figures printed above dependingon the architecture and operating system used to run the example.

C code for the problem is presented below:

2929

Code Examples C

Example C-36 C Example to Solve 2D Poisson Problem/*******************************************************************************


/* Copyright(C) 2006 Intel Corporation. All Rights Reserved.


/* the source code ("Material") are owned by Intel Corporation or itssuppliers

/* or licensors. Title to the Material remains with Intel Corporationor its

/* suppliers and licensors. The Material contains trade secrets andproprietary


/* Material is protected by worldwide copyright and trade secretlaws and


/* modified, published, uploaded, posted, transmitted, distributed ordisclosed

/* in any way without Intel");s prior express written permission.






/*

/*******************************************************************************

/* Content:

/* C double precision example of solving 2D Poisson problem in a

2930


/* rectangular domain using MKL Poisson Library

/*

/*******************************************************************************/

#include <stdio.h>

#include <malloc.h>

#include <math.h>

/* Include Poisson Library header files */


#include "mkl_poisson.h"

int main(void)

2931

Code Examples C

{

/* Note that the size of the transform nx must be even !!! */

int nx=6, ny=6;

double pi=3.14159265358979324;

int ix, iy, i, stat;

int ipar[128];

double ax, bx, ay, by, lx, ly, hx, hy, xi, yi, cx, cy;

double *dpar, *f, *u, *bd_ax, *bd_bx, *bd_ay, *bd_by;

double q;

DFTI_DESCRIPTOR_HANDLE xhandle = 0;

char *BCtype;

/* Printing the header for the example */

printf("\n Example of use of MKL Poisson Library\n");

printf(" **********************************************\n\n");

printf(" This example gives the solution of 2D Poisson problem\n");

printf(" with the equation -u_xx-u_yy=f(x,y), 0<x<1, 0<y<1,\n");

printf(" f(x,y)=(8*pi*pi)*sin(2*pi*x)*sin(2*pi*y),\n");

printf(" and with the following boundary conditions:\n");

printf(" u(0,y)=u(1,y)=1 (Dirichlet boundary conditions),\n");

printf(" -u_y(x,0)=-2.0*pi*sin(2*pi*x) (Neumann boundary condition),\n");

printf(" u_y(x,1)= 2.0*pi*sin(2*pi*x) (Neumann boundary condition).\n");

printf("-----------------------------------------------------------------------\n");

printf(" In general, the error should be of order O(1.0/nx^2+1.0/ny^2)\n");

printf(" For this example, the value of nx=ny is %d\n", nx);

printf(" The approximation error should be of order 1.0e-01, if everythingis OK\n");

printf("-----------------------------------------------------------------------\n");

printf(" Note that nx should be even to use Poisson Library !\n");

2932

printf("-----------------------------------------------------------------------\n");


printf("=======================================================================\n\n");

dpar=(double*)malloc((5*nx/2+7)*sizeof(double));

f=(double*)malloc((nx+1)*(ny+1)*sizeof(double));

u=(double*)malloc((nx+1)*(ny+1)*sizeof(double));

bd_ax=(double*)malloc((ny+1)*sizeof(double));

bd_bx=(double*)malloc((ny+1)*sizeof(double));

bd_ay=(double*)malloc((nx+1)*sizeof(double));

bd_by=(double*)malloc((nx+1)*sizeof(double));

/* Defining the rectangular domain 0<x<1, 0<y<1 for 2D Poisson Solver */

ax=0.0E0;

bx=1.0E0;

ay=0.0E0;

by=1.0E0;

/*******************************************************************************

Setting the coefficient q to 0.

Note that this is the way to use Helmholtz Solver to solve Poisson problem!

*******************************************************************************/

2933

Code Examples C

q=0.0E0;

/* Computing the mesh size hx in x-direction */

lx=bx-ax;

hx=lx/nx;

/* Computing the mesh size hy in y-direction */

ly=by-ay;

hy=ly/ny;

/* Filling in the values of the TRUE solutionu(x,y)=sin(2*pi*x)*sin(2*pi*y)+1

in the mesh points into the array u

Filling in the right-hand side f(x,y)=(8*pi*pi+q)*sin(2*pi*x)*sin(2*pi*y)+q

in the mesh points into the array f.

We choose the right-hand side to correspond to the TRUE solution of Poissonequation.

Here we are using the mesh sizes hx and hy computed before to compute

2934


the coordinates (xi,yi) of the mesh points */

for(iy=0;iy<=ny;iy++)

{

for(ix=0;ix<=nx;ix++)

{

xi=hx*ix/lx;

yi=hy*iy/ly;

cx=sin(2*pi*xi);

cy=sin(2*pi*yi);

u[ix+iy*(nx+1)]=1.0E0*cx*cy;

f[ix+iy*(nx+1)]=(8.0E0*pi*pi)*u[ix+iy*(nx+1)];

u[ix+iy*(nx+1)]=u[ix+iy*(nx+1)]+1.0E0;

}

}

/* Setting the type of the boundary conditions on each side of therectangular domain:

On the boundary laying on the line x=0(=ax) Dirichlet boundary conditionwill be used

On the boundary laying on the line x=1(=bx) Dirichlet boundary conditionwill be used

On the boundary laying on the line y=0(=ay) Neumann boundary conditionwill be used

On the boundary laying on the line y=1(=by) Neumann boundary conditionwill be used */

BCtype = "DDNN";

/* Setting the values of the boundary function G(x,y) that is equal tothe TRUE solution

2935

Code Examples C

in the mesh points laying on Dirichlet boundaries */


{

bd_ax[iy]=1.0E0;

bd_bx[iy]=1.0E0;

}

/* Setting the values of the boundary function g(x,y) that is equal tothe normal derivative

of the TRUE solution in the mesh points laying on Neumann boundaries */

for(ix=0;ix<=nx;ix++)

{

bd_ay[ix]=-2.0*pi*sin(2*pi*ix/nx);

bd_by[ix]= 2.0*pi*sin(2*pi*ix/nx);

}

/* Initializing ipar array to make it free from garbage */

for(i=0;i<128;i++)

{

ipar[i]=0;

}

/* Initializing simple data structures of Poisson Library for 2D PoissonSolver */

d_init_Helmholtz_2D(&ax, &bx, &ay, &by, &nx, &ny, BCtype, &q, ipar, dpar,&stat);

if (stat!=0) goto FAILURE;

/* Initializing complex data structures of Poisson Library for 2D PoissonSolver

NOTE: Right-hand side f may be altered after the Commit step. If you wantto keep it,

2936

you should save it in another memory location! */

d_commit_Helmholtz_2D(f, bd_ax, bd_bx, bd_ay, bd_by, &xhandle, ipar, dpar,&stat);


/* Computing the approximate solution of 2D Poisson problem

NOTE: Boundary data stored in the arrays bd_ax, bd_bx, bd_ay, bd_by shouldnot be changed

between the Commit step and the subsequent call to the Solver routine/*

Otherwise the results may be wrong. */

d_Helmholtz_2D(f, bd_ax, bd_bx, bd_ay, bd_by, &xhandle, ipar, dpar, &stat);


/* Cleaning the memory used by xhandle */

free_Helmholtz_2D(&xhandle, ipar, &stat);


/* Now we can use xhandle to solve another 2D Poisson problem*/


printf("The number of mesh intervals in x-direction is nx=%d\n", nx);

printf("The number of mesh intervals in y-direction is ny=%d\n\n",ny);

/* Watching the error along the line x=hx */

ix=1;


{

printf("In the mesh point (%5.3f,%5.3f) the error between thecomputed and the true solution is equal to %10.3e\n", ix*hx, iy*hy,f[ix+iy*(nx+1)]-u[ix+iy*(nx+1)]);

}

2937

Code Examples C

/* Success message to print if everything is OK */

printf("\n Double precision 2D Poisson example has successfullyPASSED\n");

printf(" through all steps of computation!\n");


goto SUCCESS;


FAILURE: printf("\nDouble precision 2D Poisson example FAILED to computethe solution...\n");

SUCCESS: return 0;


}

Fortran code for the computations is given below:

2938


Example C-37 Fortran Example to Solve 2D Poisson Problem!*******************************************************************************

















!

!*******************************************************************************

! Content:

! Fortran-90 double precision example of solving 2D Poisson problem in a

2939

Code Examples C

! rectangular domain using MKL Poisson Library

!

!*******************************************************************************

program Poisson_2D_double_precision

! Include modules defined by mkl_poisson.f90 and mkl_dfti.f90 header files

use mkl_poisson

use mkl_dfti

implicit none

integer nx,ny

! Note that the size of the transform nx must be even !!!

parameter(nx=6, ny=6)

double precision pi

parameter(pi=3.14159265358979324D0)

integer ix, iy, i, stat

integer ipar(128)

double precision ax, bx, ay, by, lx, ly, hx, hy, xi, yi, cx, cy

double precision dpar(5*nx/2+7)

! Note that proper packing of data in right-hand side array f is

2940


! automatically provided by the following declaration of the arrays

double precision f(nx+1,ny+1), u(nx+1,ny+1)

double precision bd_ax(ny+1), bd_bx(ny+1), bd_ay(nx+1), bd_by(nx+1)

double precision q

type(DFTI_DESCRIPTOR), pointer :: xhandle

character(4) BCtype


print *, ''

print *, ' Example of use of MKL Poisson Library'

print *, ' **********************************************'

print *, ''

print *, ' This example gives the solution of 2D Poisson problem'

print *, ' with the equation -u_xx-u_yy=f(x,y), 0<x<1, 0<y<1,'

print *, ' f(x,y)=(8*pi*pi)*sin(2*pi*x)*sin(2*pi*y),'

print *, ' and with the following boundary conditions:'

print *, ' u(0,y)=u(1,y)=1 (Dirichlet boundary conditions),'

print *, ' -u_y(x,0)=-2.0*pi*sin(2*pi*x) (Neumann boundary condition),'

print *, ' u_y(x,1)= 2.0*pi*sin(2*pi*x) (Neumann boundary condition).'

print *, '-----------------------------------------------------------------------'

print *, ' In general, the error should be of orderO(1.0/nx^2+1.0/ny^2)'

print '(1x,a,I1)', ' For this example, the value of nx=ny is ', nx

print *, ' The approximation error should be of order 0.1E+0, ifeverything is OK'

print *, '-----------------------------------------------------------------------'

print *, ' Note that nx should be even to use Poisson Library !'

print *, '-----------------------------------------------------------------------'

print *, ' DOUBLE PRECISION COMPUTATIONS

2941

Code Examples C

'

print *, '======================================================================='

print *, ''

! Defining the rectangular domain 0<x<1, 0<y<1 for 2D Poisson Solver<

ax=0.0D0

bx=1.0D0

ay=0.0D0

by=1.0D0

!*******************************************************************************

! Setting the coefficient q to 0.

! Note that this is the way to use Helmholtz Solver to solve Poisson problem!

!*******************************************************************************

q=0.0D0

! Computing the mesh size hx in x-direction

lx=bx-ax

hx=lx/nx

! Computing the mesh size hy in y-direction

ly=by-ay

hy=ly/ny

! Filling in the values of the TRUE solution u(x,y)=sin(2*pi*x)*sin(2*pi*y)+1

! in the mesh points into the array u

! Filling in the right-hand side f(x,y)=(8*pi*pi+q)*sin(2*pi*x)*sin(2*pi*y)+q

! in the mesh points into the array f.

2942

! We choose the right-hand side to correspond to the TRUE solution of Poissonequation.

! Here we are using the mesh sizes hx and hy computed before to compute

! the coordinates (xi,yi) of the mesh points

do iy=1,ny+1

do ix=1,nx+1

xi=hx*(ix-1)/lx

yi=hy*(iy-1)/ly

cx=dsin(2*pi*xi)

cy=dsin(2*pi*yi)

u(ix,iy)=1.0D0*cx*cy

f(ix,iy)=(8.0D0*pi**2)*u(ix,iy)

u(ix,iy)=u(ix,iy)+1.0D0

enddo

enddo

! Setting the type of the boundary conditions on each side of the rectangulardomain:

! On the boundary laying on the line x=0(=ax) Dirichlet boundary conditionwill be used

! On the boundary laying on the line x=1(=bx) Dirichlet boundary conditionwill be used

! On the boundary laying on the line y=0(=ay) Neumann boundary conditionwill be used

! On the boundary laying on the line y=1(=by) Neumann boundary conditionwill be used

BCtype = 'DDNN'

! Setting the values of the boundary function G(x,y) that is equal to theTRUE solution

2943

Code Examples C

! in the mesh points laying on Dirichlet boundaries

do iy = 1,ny+1

bd_ax(iy) = 1.0D0

bd_bx(iy) = 1.0D0

enddo

! Setting the values of the boundary function g(x,y) that is equal to thenormal derivative

! of the TRUE solution in the mesh points laying on Neumann boundaries

do ix = 1,nx+1

bd_ay(ix) = -2.0*pi*dsin(2*pi*(ix-1)/nx)

bd_by(ix) = 2.0*pi*dsin(2*pi*(ix-1)/nx)

enddo

! Initializing ipar array to make it free from garbage

do i=1,128

ipar(i)=0

enddo

! Initializing simple data structures of Poisson Library for 2D PoissonSolver

call d_init_Helmholtz_2D(ax, bx, ay, by, nx, ny, BCtype, q, ipar, dpar,stat)

if (stat.ne.0) goto 999

! Initializing complex data structures of Poisson Library for 2D PoissonSolver

! NOTE: Right-hand side f may be altered after the Commit step. If you wantto keep it,

! you should save it in another memory location!

call d_commit_Helmholtz_2D(f, bd_ax, bd_bx, bd_ay, bd_by, xhandle, ipar,dpar, stat)

2944



! Computing the approximate solution of 2D Poisson problem

! NOTE: Boundary data stored in the arrays bd_ax, bd_bx, bd_ay, bd_by shouldnot be changed

! between the Commit step and the subsequent call to the Solver routine!

! Otherwise the results may be wrong.

call d_Helmholtz_2D(f, bd_ax, bd_bx, bd_ay, bd_by, xhandle, ipar, dpar,stat)


! Cleaning the memory used by xhandle

call free_Helmholtz_2D(xhandle, ipar, stat)


! Now we can use xhandle to solve another 2D Poisson problem


write(*,10) nx

write(*,11) ny

print *, ''

! Watching the error along the line x=hx

ix=2

do iy=1,ny+1

write(*,12) (ix-1)*hx, (iy-1)*hy, f(ix,iy)-u(ix,iy)

enddo

print *, ''

2945

Code Examples C

! Success message to print if everything is OK

print *, ' Double precision 2D Poisson example has successfully PASSED'

print *, ' through all steps of computation!'


go to 1


999 print *, 'Double precision 2D Poisson example FAILED to compute thesolution...'

1 continue

10 format(1x,'The number of mesh intervals in x-direction is nx=',I1)

11 format(1x,'The number of mesh intervals in y-direction is ny=',I1)

12 format(1x,'In the mesh point (',F5.3,',',F5.3,') the error betweenthe computed and the true solution is equal to ', E10.3)


end

Spherical case

The code below computes an approximate solution of a Helmholtz problem on the entire sphere:

with aφ = 0, bφ = 2π, aθ = 0, bθ = π, and a boundary condition

2946


at the poles (periodic case).

The exact solution is known to be u(φ,θ) = cosθ .

Example C-38 implements the computations in C and Example C-39 provides Fortran-90 code.Mind that PL interface cannot be invoked from Fortran-77 due to restrictions imposed by theuse of Intel MKL DFT interface.

The algorithm of computing the approximate solution of a Poisson Problem uses Poisson Libraryroutines, described in chapter 13. Details of the Poisson problem being solved and the errorsare printed out between the computed solution and the exact one. Upon successful executionof Example C-38 the following text is printed out (Example C-39produces similar output):

Example of use of MKL Poisson Library

**********************************************

This example gives the solution of Helmholtz problem on a whole sphere

0<p<2*pi, 0<t<pi, with Helmholtz coefficient q=1 and right-hand side

f(p,t)=3*cos(t)

-----------------------------------------------------------------------

In general, the error should be of order O(1.0/np^2+1.0/nt^2)

For this example, the value of np=nt is 8

The approximation error should be of order 1.5e-01, if everything is OK

-----------------------------------------------------------------------

Note that np should be divisible by 4 to solve the PERIODIC problem!

-----------------------------------------------------------------------


2947

Code Examples C

=======================================================================

NO3NO4The number of mesh intervals in phi-direction is np=8

The number of mesh intervals in theta-direction is nt=8

In the mesh point (0.785,0.000) the error between the computed and the truesolution is equal to -1.402e-001



In the mesh point (0.785,1.178) the error between the computed and the truesolution is equal to 2.964e-004






Double precision Helmholtz example on a whole sphere has successfully PASSED

through all steps of computation!

Note that actual figures in the error may slightly differ from the figures printed abovedependingon the architecture and operating system used to run the example.

C code for the problem is presented below:

2948


Example C-38 C Example to Solve Helmholtz Problem on a Sphere/*******************************************************************************


/* Copyright(C) 2006 Intel Corporation. All Rights Reserved.


/* the source code ("Material") are owned by Intel Corporation or itssuppliers

/* or licensors. Title to the Material remains with Intel Corporationor its

/* suppliers and licensors. The Material contains trade secrets andproprietary


/* Material is protected by worldwide copyright and trade secretlaws and


/* modified, published, uploaded, posted, transmitted, distributed ordisclosed

/* in any way without Intel's prior express written permission.






/*

/*******************************************************************************

/* Content:

/* C double precision example of solving Helmholtz problem on a whole sphere

2949

Code Examples C

/* using MKL Poisson Library

/*

/*******************************************************************************/

#include <stdio.h>

#include <malloc.h>

#include <math.h>

/* Include Poisson Library header files */


#include "mkl_poisson.h"

int main(void)

{

/* Note that the size of the transform np must be divisible by 4 !!! */

int np=8, nt=8;

double pi=3.14159265358979324;

int ip, it, i, stat;

int ipar[128];

double ap, bp, at, bt, lp, lt, hp, ht, theta_i, ct;

double *dpar, *f, *u;

double q;

DFTI_DESCRIPTOR_HANDLE handle_s = 0;

DFTI_DESCRIPTOR_HANDLE handle_c = 0;

/* Printing the header for the example */

printf("\n Example of use of MKL Poisson Library\n");

printf(" **********************************************\n\n");

printf(" This example gives the solution of Helmholtz problem on a wholesphere\n");

printf(" 0<p<2*pi, 0<t<pi, with Helmholtz coefficient q=1 and right-handside\n");

printf(" f(p,t)=3*cos(t)\n");

2950

printf("-----------------------------------------------------------------------\n");

printf(" In general, the error should be of order O(1.0/np^2+1.0/nt^2)\n");

printf(" For this example, the value of np=nt is %d\n", np);

printf(" The approximation error should be of order 1.5e-01, if everythingis OK\n");

printf("-----------------------------------------------------------------------\n");

printf(" Note that np should be divisible by 4 to solve the PERIODICproblem!\n");

printf("-----------------------------------------------------------------------\n");


printf("=======================================================================\n\n");

dpar=(double*)malloc((7*np/2+10)*sizeof(double));

f=(double*)malloc((np+1)*(nt+1)*sizeof(double));

u=(double*)malloc((np+1)*(nt+1)*sizeof(double));

/* Defining the rectangular domain on a sphere 0<p<2*pi, 0<t<pi for HelmholtzSolver on a sphere */

/* Poisson Library will automatically detect that this problem is on a wholesphere! */

ap=0.0E0;

bp=2*pi;

at=0.0E0;

bt=pi;

/* Setting the coefficient q to 1.0E0 for Helmholtz problem */

/* If you like to solve Poisson problem, please set q to 0.0E0 */

q=1.0E0;

/* Computing the mesh size hp in phi-direction */

lp=bp-ap;

2951

Code Examples C

hp=lp/np;

/* Computing the mesh size ht in theta-direction */

lt=bt-at;

ht=lt/nt;

/* Filling in the values of the TRUE solution u(p,t)=cos(t)

in the mesh points into the array u

Filling in the right-hand side f(p,t)=(2+q)*cos(t)

in the mesh points into the array f.

We choose the right-hand side to correspond to the TRUE solution of Helmholtzequation on a sphere.

Here we are using the mesh sizes hp and ht computed before to compute

the coordinates (phi_i,theta_i) of the mesh points */

for(it=0;it<=nt;it++)

{</codeblock><codeblock> for(ip=0;ip<=np;ip++)

{

theta_i=ht*it;

ct=cos(theta_i);

u[ip+it*(np+1)]=ct;

f[ip+it*(np+1)]=ct*(2.+q);

}

}

/* Initializing ipar array to make it free from garbage */

for(i=0;i<128;i++)

{

ipar[i]=0;

}

2952

/* Initializing simple data structures of Poisson Library for HelmholtzSolver on a sphere */

/* As we are looking for the solution on a whole sphere, this is a PERIDOCproblem */

/* Therefore, the routines ending with "_p" are used to find the solution*/

d_init_sph_p(&ap,&bp,&at,&bt,&np,&nt,&q,ipar,dpar,&stat);


/* Initializing complex data structures of Poisson Library for HelmholtzSolver on a sphere

NOTE: Right-hand side f may be altered after the Commit step. If you wantto keep it,

you should save it in another memory location! */

d_commit_sph_p(f,&handle_s,&handle_c,ipar,dpar,&stat);


/* Computing the approximate solution of Helmholtz problem on a whole sphere*/

d_sph_p(f,&handle_s,&handle_c,ipar,dpar,&stat);


/* Cleaning the memory used by handle_s and handle_c */

free_sph_p(&handle_s,&handle_c,ipar,&stat);


/* Now we can use handle_s and handle_c to solve another Helmholtz problem*/

/* after a proper initialization */

2953

Code Examples C

printf("The number of mesh intervals in phi-direction is np=%d\n", np);

printf("The number of mesh intervals in theta-direction is nt=%d\n\n", nt);

/* Watching the error along the line phi=hp */

ip=1;

for(it=0;it<=nt;it++)

{

printf("In the mesh point (%5.3f,%5.3f) the error between the computed andthe true solution is equal to %10.3e\n", ip*hp, it*ht,f[ip+it*(np+1)]-u[ip+it*(np+1)]);

}

/* Success message to print if everything is OK */

printf("\n Double precision Helmholtz example on a whole sphere hassuccessfully PASSED\n");

printf(" through all steps of computation!\n");


goto SUCCESS;


FAILURE: printf("\nDouble precision Helmholtz example on a whole sphere hasFAILED to compute the solution...\n");

return -1;

SUCCESS: return 0;


}

Fortran-90 code for the problem is presented below:

2954

Example C-39 Fortran Example to Solve Helmholtz Problem on a Sphere!*******************************************************************************

















!

!*******************************************************************************

! Content:

! Fortran-90 precision example of solving Helmholtz problem on a whole

2955

Code Examples C

sphere

! using MKL Poisson Library

!

!*******************************************************************************

!

program d_sph_with_poles

! Include modules defined by mkl_poisson.f90 and mkl_dfti.f90 header files

use mkl_dfti

use mkl_poisson

implicit none

integer np,nt

parameter(np=8,nt=8)

double precision pi

parameter(pi=3.14159265358979324D0)

double precision ap,bp,hp,at,bt,ht,q,lp,lt,theta_i,ct

double precision u(np+1,nt+1),f(np+1,nt+1)

type(DFTI_DESCRIPTOR), pointer :: handle_s, handle_c

integer stat

double precision dpar(7*np/2+10)

integer ip,it,i

integer ipar(128)

2956


print *, ''

print *, ' Example of use of MKL Poisson Library'

print *, ' **********************************************'

print *, ''

print *, ' This example gives the solution of Helmholtz problem on awhole sphere'

print *, ' 0<p<2*pi, 0<t<pi, with Helmholtz coefficient q=1 andright-hand side'

print *, ' f(p,t)=3*cos(t)'

print *, '-----------------------------------------------------------------------'

print *, ' In general, the error should be of orderO(1.0/np^2+1.0/nt^2)'

print '(1x,a,I1)', ' For this example, the value of np=nt is ', np

print *, ' The approximation error should be of order 0.15E+0, ifeverything is OK'

print *, '-----------------------------------------------------------------------'

print *, ' Note that np should be divisible by 4 to solve the PERIODICproblem!'

print *, '-----------------------------------------------------------------------'

print *, ' DOUBLE PRECISION COMPUTATIONS'

print *, '======================================================================='

print *, ''

2957

Code Examples C

! Defining the rectangular domain on a sphere 0<p<2*pi, 0<t<pi for HelmholtzSolver on a sphere

! Poisson Library will automatically detect that this problem is on a wholesphere!

ap=0.0D0

bp=2*pi

at=0.0D0

bt=pi

! Setting the coefficient q to 1.0D0 for Helmholtz problem

! If you like to solve Poisson problem, please set q to 0.0D0

q=1.0D0

! Computing the mesh size hp in phi-direction

lp=bp-ap

hp=lp/np

! Computing the mesh size ht in theta-direction

lt=bt-at

ht=lt/nt

! Filling in the values of the TRUE solution u(p,t)=cos(t)

! in the mesh points into the array u

! Filling in the right-hand side f(p,t)=3*cos(t)

! in the mesh points into the array f.

! We choose the right-hand side to correspond to the TRUE solution ofHelmholtz equation.

! Here we are using the mesh sizes hp and ht computed before to compute

2958

! the coordinates (phi_i,theta_i) of the mesh points

do it=1,nt+1

do ip=1,np+1

theta_i=ht*(it-1)

ct=dcos(theta_i)

u(ip,it)=ct

f(ip,it)=(2.0D0+q)*ct

enddo

enddo

! Initializing ipar array to make it free from garbage

do i=1,128

ipar(i)=0

enddo

! Initializing simple data structures of Poisson Library for Helmholtz Solveron a sphere

! As we are looking for the solution on a whole sphere, this is a PERIDOCproblem

! Therefore, the routines ending with "_P" are used to find the solution

call D_INIT_SPH_P(ap,bp,at,bt,np,nt,q,ipar,dpar,stat)


! Initializing complex data structures of Poisson Library for HelmholtzSolver on a sphere

! NOTE: Right-hand side f may be altered after the Commit step. If you wantto keep it,

2959

Code Examples C

! you should save it in another memory location!

call D_COMMIT_SPH_P(f,handle_s,handle_c,ipar,dpar,stat)


! Computing the approximate solution of Helmholtz problem on a whole sphere

call D_SPH_P(f,handle_s,handle_c,ipar,dpar,stat)


! Cleaning the memory used by handle_s and handle_c

call FREE_SPH_P(HANDLE_S,HANDLE_C,IPAR,STAT)


! Now we can use handle_s and handle_c to solve another Helmholtz problemafter a proper initialization


write(*,10) np

write(*,11) nt

print *, ''

! Watching the error along the line phi=hp

ip=1

do it=1,nt+1

write(*,12) (ip-1)*hp, (it-1)*ht, f(ip,it)-u(ip,it)

enddo

print *, ''

2960


! Success message to print if everything is OK

print *, ' Double precision Helmholtz example on a whole sphere hassuccessfully PASSED'

print *, ' through all steps of computation!'


go to 1


999 print *, 'Double precision Helmholtz example on a whole sphere FAILEDto compute the solution...'

1 continue

10 format(1x,'The number of mesh intervals in x-direction is np=',I1)

11 format(1x,'The number of mesh intervals in y-direction is nt=',I1)

12 format(1x,'In the mesh point (',F5.3,',',F5.3,') the error between thecomputed and the true solution is equal to ', E10.3)


end

2961

Code Examples C

DCBLAS Interface to the BLAS

This appendix presents CBLAS, the C interface to the Basic Linear Algebra Subprograms (BLAS) implementedin Intel® MKL.

Similar to BLAS, the CBLAS interface includes the following levels of functions:

• “Level 1 CBLAS” (vector-vector operations)

• “Level 2 CBLAS” (matrix-vector operations)

• “Level 3 CBLAS” (matrix-matrix operations).

• “Sparse CBLAS” (operations on sparse vectors).

To obtain the C interface, the Fortran routine names are prefixed with cblas_ (for example, dasum becomescblas_dasum). Names of all CBLAS functions are in lowercase letters.

Complex functions ?dotc and ?dotu become CBLAS subroutines (void functions); they return the complexresult via a void pointer, added as the last parameter. CBLAS names of these functions are suffixed with_sub. For example, the BLAS function cdotc corresponds to cblas_cdotc_sub.

In the descriptions of CBLAS interfaces, links provided for each function group lead to the descriptions ofthe respective Fortran-interface BLAS functions.

CBLAS ArgumentsThe arguments of CBLAS functions obey the following rules:

• Input arguments are declared with the const modifier.

• Non-complex scalar input arguments are passed by value.

• Complex scalar input arguments are passed as void pointers.

• Array arguments are passed by address.

• Output scalar arguments are passed by address.

• BLAS character arguments are replaced by the appropriate enumerated type.

• Level 2 and Level 3 routines acquire an additional parameter of type CBLAS_ORDER as their firstargument. This parameter specifies whether two-dimensional arrays are row-major(CblasRowMajor) or column-major (CblasColMajor).

2963

Enumerated Types

The CBLAS interface uses the following enumerated types:

enum CBLAS_ORDER {

CblasRowMajor=101, /* row-major arrays */

CblasColMajor=102}; /* column-major arrays */

enum CBLAS_TRANSPOSE {

CblasNoTrans=111, /* trans='N' */

CblasTrans=112, /* trans='T' */

CblasConjTrans=113}; /* trans='C' */

enum CBLAS_UPLO {

CblasUpper=121, /* uplo ='U' */

CblasLower=122}; /* uplo ='L' */

enum CBLAS_DIAG {

CblasNonUnit=131, /* diag ='N' */

CblasUnit=132}; /* diag ='U' */

enum CBLAS_SIDE {

CblasLeft=141, /* side ='L' */

CblasRight=142}; /* side ='R' */

Level 1 CBLASThis is an interface to “BLAS Level 1 Routines and Functions”, which perform basic vector-vectoroperations.

?asum

float cblas_sasum(const int N, const float *X, const int incX);

double cblas_dasum(const int N, const double *X, const int incX);

float cblas_scasum(const int N, const void *X, const int incX);

double cblas_dzasum(const int N, const void *X, const int incX);

2964

D Intel® Math Kernel Library Reference Manual

?axpy

void cblas_saxpy(const int N, const float alpha, const float *X, const intincX, float *Y, const int incY);

void cblas_daxpy(const int N, const double alpha, const double *X, const intincX, double *Y, const int incY);

void cblas_caxpy(const int N, const void *alpha, const void *X, const intincX, void *Y, const int incY);

void cblas_zaxpy(const int N, const void *alpha, const void *X, const intincX, void *Y, const int incY);

?copy

void cblas_scopy(const int N, const float *X, const int incX, float *Y, constint incY);

void cblas_dcopy(const int N, const double *X, const int incX, double *Y,const int incY);

void cblas_ccopy(const int N, const void *X, const int incX, void *Y, constint incY);

void cblas_zcopy(const int N, const void *X, const int incX, void *Y, constint incY);

?dot

float cblas_sdot(const int N, const float *X, const int incX, const float*Y, const int incY);

double cblas_ddot(const int N, const double *X, const int incX, const double*Y, const int incY);

?sdot

float cblas_sdsdot(const int N, const float *SB, const float *SX, const intincX, const float *SY, const int incY);

double cblas_dsdot(const int N, const float *SX, const int incX, const float*SY, const int incY);

2965

CBLAS Interface to the BLAS D

?dotc

void cblas_cdotc_sub(const int N, const void *X, const int incX, const void*Y, const int incY, void *dotc);

void cblas_zdotc_sub(const int N, const void *X, const int incX, const void*Y, const int incY, void *dotc);

?dotu

void cblas_cdotu_sub(const int N, const void *X, const int incX, const void*Y, const int incY, void *dotu);

void cblas_zdotu_sub(const int N, const void *X, const int incX, const void*Y, const int incY, void *dotu);

?nrm2

float cblas_snrm2(const int N, const float *X, const int incX);

double cblas_dnrm2(const int N, const double *X, const int incX);

float cblas_scnrm2(const int N, const void *X, const int incX);

double cblas_dznrm2(const int N, const void *X, const int incX);

?rot

void cblas_srot(const int N, float *X, const int incX, float *Y, const intincY, const float c, const float s);

void cblas_drot(const int N, double *X, const int incX, double *Y,const intincY, const double c, const double s);

?rotg

void cblas_srotg(float *a, float *b, float *c, float *s);

void cblas_drotg(double *a, double *b, double *c, double *s);

2966


?rotm

void cblas_srotm(const int N, float *X, const int incX, float *Y, const intincY, const float *P);

void cblas_drotm(const int N, double *X, const int incX, double *Y, constint incY, const double *P);

?rotmg

void cblas_srotmg(float *d1, float *d2, float *b1, const float b2, float*P);

void cblas_drotmg(double *d1, double *d2, double *b1, const double b2, double*P);

?scal

void cblas_sscal(const int N, const float alpha, float *X, const int incX);

void cblas_dscal(const int N, const double alpha, double *X, const int incX);

void cblas_cscal(const int N, const void *alpha, void *X, const int incX);

void cblas_zscal(const int N, const void *alpha, void *X, const int incX);

void cblas_csscal(const int N, const float alpha, void *X, const int incX);

void cblas_zdscal(const int N, const double alpha, void *X, const int incX);

?swap

void cblas_sswap(const int N, float *X, const int incX, float *Y, const intincY);

void cblas_dswap(const int N, double *X, const int incX, double *Y, constint incY);

2967


void cblas_cswap(const int N, void *X, const int incX, void *Y, const intincY);

void cblas_zswap(const int N, void *X, const int incX, void *Y, const intincY);

i?amax

CBLAS_INDEX cblas_isamax(const int N, const float *X, const int incX);

CBLAS_INDEX cblas_idamax(const int N, const double *X, const int incX);

CBLAS_INDEX cblas_icamax(const int N, const void *X, const int incX);

CBLAS_INDEX cblas_izamax(const int N, const void *X, const int incX);

i?amin

CBLAS_INDEX cblas_isamin(const int N, const float *X, const int incX);

CBLAS_INDEX cblas_idamin(const int N, const double *X, const int incX);

CBLAS_INDEX cblas_icamin(const int N, const void *X, const int incX);

CBLAS_INDEX cblas_izamin(const int N, const void *X, const int incX);

Level 2 CBLASThis is an interface to “BLAS Level 2 Routines”, which perform basic matrix-vector operations.Each C routine in this group has an additional parameter of type CBLAS_ORDER (the firstargument) that determines whether the two-dimensional arrays use column-major or row-majorstorage.

2968


?gbmv

void cblas_sgbmv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const int KL, const int KU, const floatalpha, const float *A, const int lda, const float *X, const int incX, constfloat beta, float *Y, const int incY);

void cblas_dgbmv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const int KL, const int KU, const doublealpha, const double *A, const int lda, const double *X, const int incX, constdouble beta, double *Y, const int incY);

void cblas_cgbmv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const int KL, const int KU, const void*alpha, const void *A, const int lda, const void *X, const int incX, constvoid *beta, void *Y, const int incY);

void cblas_zgbmv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const int KL, const int KU, const void*alpha, const void *A, const int lda, const void *X, const int incX, constvoid *beta, void *Y, const int incY);

?gemv

void cblas_sgemv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const float alpha, const float *A, constint lda, const float *X, const int incX, const float beta, float *Y, constint incY);

void cblas_dgemv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const double alpha, const double *A, constint lda, const double *X, const int incX, const double beta, double *Y,const int incY);

void cblas_cgemv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const void *alpha, const void *A, constint lda, const void *X, const int incX, const void *beta, void *Y, const intincY);

void cblas_zgemv(const enum CBLAS_ORDER order, const enum CBLAS_TRANSPOSETransA, const int M, const int N, const void *alpha, const void *A, constint lda, const void *X, const int incX, const void *beta, void *Y, const intincY);

2969


?ger

void cblas_sger(const enum CBLAS_ORDER order, const int M, const int N, constfloat alpha, const float *X, const int incX, const float *Y, const int incY,float *A, const int lda);

void cblas_dger(const enum CBLAS_ORDER order, const int M, const int N, constdouble alpha, const double *X, const int incX, const double *Y, const intincY, double *A, const int lda);

?gerc

void cblas_cgerc(const enum CBLAS_ORDER order, const int M, const int N,const void *alpha, const void *X, const int incX, const void *Y, const intincY, void *A, const int lda);

void cblas_zgerc(const enum CBLAS_ORDER order, const int M, const int N,const void *alpha, const void *X, const int incX, const void *Y, const intincY, void *A, const int lda);

?geru

void cblas_cgeru(const enum CBLAS_ORDER order, const int M, const int N,const void *alpha, const void *X, const int incX, const void *Y, const intincY, void *A, const int lda);

void cblas_zgeru(const enum CBLAS_ORDER order, const int M, const int N,const void *alpha, const void *X, const int incX, const void *Y, const intincY, void *A, const int lda);

?hbmv

void cblas_chbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const int K, const void *alpha, const void *A, const int lda,const void *X, const int incX, const void *beta, void *Y, const int incY);

void cblas_zhbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const int K, const void *alpha, const void *A, const int lda,const void *X, const int incX, const void *beta, void *Y, const int incY);

2970


?hemv

void cblas_chemv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *A, const int lda, const void *X,const int incX, const void *beta, void *Y, const int incY);

void cblas_zhemv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *A, const int lda, const void *X,const int incX, const void *beta, void *Y, const int incY);

?her

void cblas_cher(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const void *X, const int incX, void *A, constint lda);

void cblas_zher(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const void *X, const int incX, void *A,const int lda);

?her2

void cblas_cher2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *X, const int incX, const void*Y, const int incY, void *A, const int lda);

void cblas_zher2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *X, const int incX, const void*Y, const int incY, void *A, const int lda);

?hpmv

void cblas_chpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *Ap, const void *X, const intincX, const void *beta, void *Y, const int incY);

void cblas_zhpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *Ap, const void *X, const intincX, const void *beta, void *Y, const int incY);

2971


?hpr

void cblas_chpr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const void *X, const int incX, void *A);

void cblas_zhpr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const void *X, const int incX, void *A);

?hpr2

void cblas_chpr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *X, const int incX, const void*Y, const int incY, void *Ap);

void cblas_zhpr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const void *alpha, const void *X, const int incX, const void*Y, const int incY, void *Ap);

?sbmv

void cblas_ssbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const int K, const float alpha, const float *A, const int lda,const float *X, const int incX, const float beta, float *Y, const int incY);

void cblas_dsbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const int K, const double alpha, const double *A, const intlda, const double *X, const int incX, const double beta, double *Y, constint incY);

?spmv

void cblas_sspmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *Ap, const float *X, const intincX, const float beta, float *Y, const int incY);

void cblas_dspmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *Ap, const double *X, constint incX, const double beta, double *Y, const int incY);

2972


?spr

void cblas_sspr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *X, const int incX, float *Ap);

void cblas_dspr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *X, const int incX, double*Ap);

?spr2

void cblas_sspr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *X, const int incX, const float*Y, const int incY, float *A);

void cblas_dspr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *X, const int incX, constdouble *Y, const int incY, double *A);

?symv

void cblas_ssymv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *A, const int lda, const float*X, const int incX, const float beta, float *Y, const int incY);

void cblas_dsymv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *A, const int lda, const double*X, const int incX, const double beta, double *Y, const int incY);

?syr

void cblas_ssyr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *X, const int incX, float *A,const int lda);

void cblas_dsyr(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *X, const int incX, double *A,const int lda);

2973


?syr2

void cblas_ssyr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const float alpha, const float *X, const int incX, const float*Y, const int incY, float *A, const int lda);

void cblas_dsyr2(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const int N, const double alpha, const double *X, const int incX, constdouble *Y, const int incY, double *A, const int lda);

?tbmv

void cblas_stbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const float *A, const int lda, float *X, const int incX);

void cblas_dtbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const double *A, const int lda, double *X, const int incX);

void cblas_ctbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const void *A, const int lda, void *X, const int incX);

void cblas_ztbmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const void *A, const int lda, void *X, const int incX);

?tbsv

void cblas_stbsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const float *A, const int lda, float *X, const int incX);

void cblas_dtbsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const double *A, const int lda, double *X, const int incX);

void cblas_ctbsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const void *A, const int lda, void *X, const int incX);

2974


void cblas_ztbsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const int K, const void *A, const int lda, void *X, const int incX);

?tpmv

void cblas_stpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const float *Ap, float *X, const int incX);

void cblas_dtpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const int N,const double *Ap, double *X, const int incX);

void cblas_ctpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *Ap, void *X, const int incX);

void cblas_ztpmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *Ap, void *X, const int incX);

?tpsv

void cblas_stpsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const float *Ap, float *X, const int incX);

void cblas_dtpsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const double *Ap, double *X, const int incX);

void cblas_ctpsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *Ap, void *X, const int incX);

void cblas_ztpsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *Ap, void *X, const int incX);

2975


?trmv

void cblas_strmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const float *A, const int lda, float *X, const int incX);

void cblas_dtrmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const double *A, const int lda, double *X, const int incX);

void cblas_ctrmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *A, const int lda, void *X, const int incX);

void cblas_ztrmv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *A, const int lda, void *X, const int incX);

?trsv

void cblas_strsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const float *A, const int lda, float *X, const int incX);

void cblas_dtrsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const double *A, const int lda, double *X, const int incX);

void cblas_ctrsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *A, const int lda, void *X, const int incX);

void cblas_ztrsv(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag, const intN,const void *A, const int lda, void *X, const int incX);

2976


Level 3 CBLASThis is an interface to “BLAS Level 3 Routines”, which perform basic matrix-matrix operations.Each C routine in this group has an additional parameter of type CBLAS_ORDER (the firstargument) that determines whether the two-dimensional arrays use column-major or row-majorstorage.

?gemm

void cblas_sgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSETransA, const enum CBLAS_TRANSPOSE TransB, const int M, const int N, constint K, const float alpha, const float *A, const int lda, const float *B,const int ldb, const float beta, float *C, const int ldc);

void cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSETransA, const enum CBLAS_TRANSPOSE TransB, const int M, const int N, constint K, const double alpha, const double *A, const int lda, const double *B,const int ldb, const double beta, double *C, const int ldc);

void cblas_cgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSETransA, const enum CBLAS_TRANSPOSE TransB, const int M, const int N, constint K, const void *alpha, const void *A, const int lda, const void *B, constint ldb, const void *beta, void *C, const int ldc);

void cblas_zgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSETransA, const enum CBLAS_TRANSPOSE TransB, const int M, const int N, constint K, const void *alpha, const void *A, const int lda, const void *B, constint ldb, const void *beta, void *C, const int ldc);

?hemm

void cblas_chemm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const void *alpha,const void *A, const int lda, const void *B, const int ldb, const void *beta,void *C, const int ldc);

void cblas_zhemm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const void *alpha,const void *A, const int lda, const void *B, const int ldb, const void *beta,void *C, const int ldc);

2977


?herk

void cblas_cherk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const floatalpha, const void *A, const int lda, const float beta, void *C, const intldc);

void cblas_zherk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const doublealpha, const void *A, const int lda, const double beta, void *C, const intldc);

?her2k

void cblas_cher2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const void*alpha, const void *A, const int lda, const void *B, const int ldb, constfloat beta, void *C, const int ldc);

void cblas_zher2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const void*alpha, const void *A, const int lda, const void *B, const int ldb, constdouble beta, void *C, const int ldc);

?symm

void cblas_ssymm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const float alpha,const float *A, const int lda, const float *B, const int ldb, const floatbeta, float *C, const int ldc);

void cblas_dsymm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const double alpha,const double *A, const int lda, const double *B, const int ldb, const doublebeta, double *C, const int ldc);

void cblas_csymm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const void *alpha,const void *A, const int lda, const void *B, const int ldb, const void *beta,void *C, const int ldc);

2978


void cblas_zsymm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const int M, const int N, const void *alpha,const void *A, const int lda, const void *B, const int ldb, const void *beta,void *C, const int ldc);

?syrk

void cblas_ssyrk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const floatalpha, const float *A, const int lda, const float beta, float *C, const intldc);

void cblas_dsyrk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const doublealpha, const double *A, const int lda, const double beta, double *C, constint ldc);

void cblas_csyrk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const void *alpha,const void *A, const int lda, const void *beta, void *C, const int ldc);

void cblas_zsyrk(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const void *alpha,const void *A, const int lda, const void *beta, void *C, const int ldc);

?syr2k

void cblas_ssyr2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const floatalpha, const float *A, const int lda, const float *B, const int ldb, constfloat beta, float *C, const int ldc);

void cblas_dsyr2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const doublealpha, const double *A, const int lda, const double *B, const int ldb, constdouble beta, double *C, const int ldc);

void cblas_csyr2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSP SE Trans, const int N, const int K, const void*alpha,const void *A, const int lda, const void *B, const int ldb, constvoid *beta, void *C, const int ldc);

2979


void cblas_zsyr2k(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,const enum CBLAS_TRANSPOSE Trans, const int N, const int K, const void*alpha, const void *A, const int lda, const void *B, const int ldb, constvoid *beta, void *C, const int ldc);

?trmm

void cblas_strmm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const float alpha, const float*A, const int lda, float *B, const int ldb);

void cblas_dtrmm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const double alpha, const double*A, const int lda, double *B, const int ldb);

void cblas_ctrmm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const void *alpha, const void *A,const int lda, void *B, const int ldb);

void cblas_ztrmm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const void *alpha, const void *A,const int lda, void *B, const int ldb);

?trsm

void cblas_strsm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const float alpha, const float*A, const int lda, float *B, const int ldb);

void cblas_dtrsm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const double alpha, const double*A, const int lda, double *B, const int ldb);

void cblas_ctrsm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const void *alpha, const void *A,const int lda, void *B, const int ldb);

2980


void cblas_ztrsm(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA, const enumCBLAS_DIAG Diag, const int M, const int N, const void *alpha, const void *A,const int lda, void *B, const int ldb);

Sparse CBLASThis is an interface to “Sparse BLAS Level 1 Routines and Functions”, which perform a numberof common vector operations on sparse vectors stored in compressed form.

Note that all index parameters, indx, are in C-type notation and vary in the range [0..N-1].

?axpyi

void cblas_saxpyi(const int N, const float alpha, const float *X, const int*indx, float *Y);

void cblas_daxpyi(const int N, const double alpha, const double *X, constint *indx, double *Y);

void cblas_caxpyi(const int N, const void *alpha, const void *X, const int*indx, void *Y);

void cblas_zaxpyi(const int N, const void *alpha, const void *X, const int*indx, void *Y);

?doti

float cblas_sdoti(const int N, const float *X, const int *indx, const float*Y);

double cblas_ddoti(const int N, const double *X, const int *indx, constdouble *Y);

?dotci

void cblas_cdotci_sub(const int N, const void *X, const int *indx, constvoid *Y, void *dotui);

void cblas_zdotci_sub(const int N, const void *X, const int *indx, constvoid *Y, void *dotui);

2981


?dotui

void cblas_cdotui_sub(const int N, const void *X, const int *indx, constvoid *Y, void *dotui);

void cblas_zdotui_sub(const int N, const void *X, const int *indx, constvoid *Y, void *dotui);

?gthr

void cblas_sgthr(const int N, const float *Y, float *X, const int *indx);

void cblas_dgthr(const int N, const double *Y, double *X, const int *indx);

void cblas_cgthr(const int N, const void *Y, void *X, const int *indx);

void cblas_zgthr(const int N, const void *Y, void *X, const int *indx);

?gthrz

void cblas_sgthrz(const int N, float *Y, float *X, const int *indx);

void cblas_dgthrz(const int N, double *Y, double *X, const int *indx);

void cblas_cgthrz(const int N, void *Y, void *X, const int *indx);

void cblas_zgthrz(const int N, void *Y, void *X, const int *indx);

?roti

void cblas_sroti(const int N, float *X, const int *indx, float *Y, constfloat c, const float s);

void cblas_droti(const int N, double *X, const int *indx, double *Y, constdouble c, const double s);

?sctr

void cblas_ssctr(const int N, const float *X, const int *indx, float *Y);

void cblas_dsctr(const int N, const double *X, const int *indx, double *Y);

void cblas_csctr(const int N, const void *X, const int *indx, void *Y);

void cblas_zsctr(const int N, const void *X, const int *indx, void *Y);

2982


ESpecific Features of Fortran-95Interfaces for LAPACK Routines

Intel® MKL implements Fortran-95 interface for LAPACK package, further referred to as MKL LAPACK-95,to provide full capacity of MKL Fortran-77 LAPACK routines. This is the principal difference of Intel MKLfrom the Netlib Fortran-95 implementation for LAPACK.

A new feature of MKL LAPACK-95 by comparison with Intel MKL LAPACK-77 implementation is presentinga package of source interfaces along with wrappers that make the implementation compiler-independent.As a result, the MKL LAPACK package can be used in all programming environments intended for Fortran-95.

Depending on the degree and type of difference from Netlib implementation, the MKL LAPACK-95 interfacesfall into several groups that require different transformations (see “MKL Fortran-95 Interfaces for LAPACKRoutines vs. Netlib Implementation”). The groups are given in full with the calling sequences of the routinesand appropriate differences from Netlib analogs.

The following conventions are used:<interface> ::= <name of interface> ‘(’ <arguments list>‘)’

<arguments list> ::= <first argument> {<argument>}*

<first argument> ::= < identifier >

<argument> ::= <required argument>|<optional argument>

<required argument> ::= ‘,’ <identifier>

<optional argument> ::= ‘[,’ <identifier> ‘]’

<name of interface> ::= <identifier>

where defined notions are separated from definitions by ::=, notion names are marked by angle brackets,terminals are given in quotes, and {…}* denotes repetition zero, one, or more times.

<first argument> and each <required argument> should be present in all calls of denoted interface,<optional argument> may be omitted. Comments to interface definitions are provided where necessary.Comment lines begin with character !.Two interfaces with one name are presented when two variants ofsubroutine calls (separated by types of arguments) exist.

2983

Interfaces Identical to NetlibGETRI(A, IPIV [,INFO])

GEEQU(A,R,C[,ROWCND][,COLCND][,AMAX][,INFO])

GESV(A,B[,IPIV][,INFO])

GESVX(A,B,X[,AF][,IPIV][,FACT][,TRANS][,EQUED][,R][,C][,FERR][,BERR]

[,RCOND][,RPVGRW][,INFO])

GBSV(A,B[,KL][,IPIV][,INFO])

GTSV(DL,D,DU,B[,INFO])

GTSVX(DL,D,DU,B,X[,DLF][,DF][,DUF][,DU2][,IPIV][,FACT][,TRANS][,FERR]

[,BERR][,RCOND][,INFO])

POSV(A,B[,UPLO][,INFO])

POSVX(A,B,X[,UPLO][,AF][,FACT][,EQUED][,S][,FERR][,BERR][,RCOND][,INFO])

PPSV(A,B[,UPLO][,INFO])

PPSVX(A,B,X[,UPLO][,AF][,FACT][,EQUED][,S][,FERR][,BERR][,RCOND][,INFO])

PBSV(A,B[,UPLO][,INFO])

PBSVX(A,B,X[,UPLO][,AF][,FACT][,EQUED][,S][,FERR][,BERR][,RCOND][,INFO])

PTSV(D,E,B[,INFO])

PTSVX(D,E,B,X[,DF][,EF][,FACT][,FERR][,BERR][,RCOND][,INFO])

SYSV(A,B[,UPLO][,IPIV][,INFO])

SYSVX(A,B,X[,UPLO][,AF][,IPIV][,FACT][,FERR][,BERR][,RCOND][,INFO])

HESVX(A,B,X[,UPLO][,AF][,IPIV][,FACT][,FERR][,BERR][,RCOND][,INFO])

SYTRD(A,TAU[,UPLO][,INFO])

ORGTR(A,TAU[,UPLO][,INFO])

HETRD(A,TAU[,UPLO][,INFO])

UNGTR(A,TAU[,UPLO][,INFO])

SYGST(A,B[,ITYPE][,UPLO][,INFO])

HEGST(A,B[,ITYPE][,UPLO][,INFO])

2984

E Intel® Math Kernel Library Reference Manual

GELS(A,B[,TRANS][,INFO])

GELSY(A,B[,RANK][,JPVT][,RCOND][,INFO])

GELSS(A,B[,RANK][,S][,RCOND][,INFO])

GELSD(A,B[,RANK][,S][,RCOND][,INFO])

GGLSE(A,B,C,D,X[,INFO])

GGGLM(A,B,D,X,Y[,INFO])

SYEV(A,W[,JOBZ][,UPLO][,INFO])

HEEV(A,W[,JOBZ][,UPLO][,INFO])

SYEVD(A,W[,JOBZ][,UPLO][,INFO])

SPEV(A,W[,UPLO][,Z][,INFO])

HPEV(A,W[,UPLO][,Z][,INFO])

SPEVD(A,W[,UPLO][,Z][,INFO])

HPEVD(A,W[,UPLO][,Z][,INFO])

SPEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

HPEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

SBEV(A,W[,UPLO][,Z][,INFO])

HBEV(A,W[,UPLO][,Z][,INFO])

SBEVD(A,W[,UPLO][,Z][,INFO])

HBEVD(A,W[,UPLO][,Z][,INFO])

SBEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,Q][,ABSTOL]

[,INFO])

HBEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,Q][,ABSTOL]

[,INFO])

HPGV(A,B,W[,ITYPE][,UPLO][,Z][,INFO])

STEV(D,E[,Z][,INFO])

STEVD(D,E[,Z][,INFO])

STEVX(D,E,W[,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

STEVR(D,E,W[,Z][,VL][,VU][,IL][,IU][,M][,ISUPPZ][,ABSTOL][,INFO])

2985

Specific Features of Fortran-95 Interfaces for LAPACK Routines E

GEES(A,WR,WI[,VS][,SELECT][,SDIM][,INFO])

GEES(A,W[,VS][,SELECT][,SDIM][,INFO])

GEESX(A,WR,WI[,VS][,SELECT][,SDIM][,RCONDE][,RCONDV][,INFO])

GEESX(A,W[,VS][,SELECT][,SDIM][,RCONDE][,RCONDV][,INFO])

GEEV(A,WR,WI[,VL][,VR][,INFO])

GEEV(A,W[,VL][,VR][,INFO])

GEEVX(A,WR,WI[,VL][,VR][,BALANC][,ILO][,IHI][,SCALE][,ABNRM][,RCONDE][,RCONDV][,INFO])

GEEVX(A,W[,VL][,VR][,BALANC][,ILO][,IHI][,SCALE][,ABNRM][,RCONDE]

[,RCONDV][,INFO])

GESVD(A,S[,U][,VT][,WW][,JOB][,INFO])

GGSVD(A,B,ALPHA,BETA[,K][,L][,U][,V][,Q][,IWORK][,INFO])

SYGV(A,B,W[,ITYPE][,JOBZ][,UPLO][,INFO])

HEGV(A,B,W[,ITYPE][,JOBZ][,UPLO][,INFO])

SYGVD(A,B,W[,ITYPE][,JOBZ][,UPLO][,INFO])

HEGVD(A,B,W[,ITYPE][,JOBZ][,UPLO][,INFO])

SPGV(A,B,W[,ITYPE][,UPLO][,Z][,INFO])

SBGV(A,B,W[,UPLO][,Z][,INFO])

HBGV(A,B,W[,UPLO][,Z][,INFO])

GGES(A,B,ALPHAR,ALPHAI,BETA[,VSL][,VSR][,SELECT][,SDIM][,INFO])

GGES(A,B,ALPHA,BETA[,VSL][,VSR][,SELECT][,SDIM][,INFO])

GGESX(A,B,ALPHAR,ALPHAI,BETA[,VSL][,VSR][,SELECT][,SDIM][,RCONDE][,RCONDV][,INFO])

GGESX(A,B,ALPHA,BETA[,VSL][,VSR][,SELECT][,SDIM][,RCONDE][,RCONDV][,INFO])

GGEV(A,B,ALPHAR,ALPHAI,BETA[,VL][,VR][,INFO])

GGEV(A,B,ALPHA,BETA[,VL][,VR][,INFO])

GGEVX(A,B,ALPHAR,ALPHAI,BETA[,VL][,VR][,BALANC][,ILO][,IHI][,LSCALE][,RSCALE][,ABNRM]

[,BBNRM][,RCONDE][,RCONDV][,INFO])

GGEVX(A,B,ALPHA,BETA[,VL][,VR][,BALANC][,ILO][,IHI][,LSCALE][,RSCALE][,ABNRM]

2986


[,BBNRM][,RCONDE][,RCONDV][,INFO])

GERFS(A,AF,IPIV,B,X[,TRANS][,FERR][,BERR][,INFO])

Interfaces with Replaced Argument NamesArgument names in the routines of this group are replaced as follows:

MKL Argument NameNetlib Argument Name

AAPAABAFAFPBBPBBB

SPSV(A,B[,UPLO][,IPIV][,INFO])

! netlib: (AP,B,UPLO,IPIV,INFO)

SPSVX(A,B,X[,UPLO][,AF][,IPIV][,FACT][,FERR][,BERR][,RCOND][,INFO])

! netlib: (A,B,X,UPLO,AFP,IPIV,FACT,FERR,BERR,RCOND,INFO)

HPSVX(A,B,X[,UPLO][,AF][,IPIV][,FACT][,FERR][,BERR][,RCOND][,INFO])

! netlib: (A,B,X,UPLO,AFP,IPIV,FACT,FERR,BERR,RCOND,INFO)

HEEVD(A,W[,JOB][,UPLO][,INFO])

! netlib: (A,W,JOBZ,UPLO,INFO)

SPGVD(A,B,W[,ITYPE][,UPLO][,Z][,INFO])

! netlib: (AP,BP,W,ITYPE,UPLO,Z,INFO)

HPGVD(A,B,W[,ITYPE][,UPLO][,Z][,INFO])

! netlib: (AP,BP,W,ITYPE,UPLO,Z,INFO)

SPGVX(A,B,W[,ITYPE][,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

! netlib: (AP,BP,W,ITYPE,UPLO,Z,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)

HPGVX(A,B,W[,ITYPE][,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

! netlib: (AP,BP,W,ITYPE,UPLO,Z,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)

SBGVD(A,B,W[,UPLO][,Z][,INFO])

2987


! netlib: (AB,BB,W,UPLO,Z,INFO)

HBGVD(A,B,W[,UPLO][,Z][,INFO])

! netlib: (AB,BB,W,UPLO,Z,INFO)

SBGVX(A,B,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,Q][,ABSTOL][,INFO])

! netlib: (AB,BB,W,UPLO,Z,VL,VU,IL,IU,M,IFAIL,Q,ABSTOL,INFO)

HBGVX(A,B,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,Q][,ABSTOL][,INFO])

! netlib: (AB,BB,W,UPLO,Z,VL,VU,IL,IU,M,IFAIL,Q,ABSTOL,INFO)

GBSVX(A,B,X[,KL][,AF][,IPIV][,FACT][,TRANS][,EQUED][,R][,C][,FERR]

[,BERR][,RCOND][,RPVGRW][,INFO])

! netlib: !(A,B,X,KL,AFB,IPIV,FACT,TRANS,EQUED,R,C,FERR,

BERR,RCOND,RPVGRW,INFO)

2988


Modified Netlib InterfacesSYEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])

! Interface netlib95 exists, parameters:

! netlib: (A,W,JOBZ,UPLO,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)

! Different order for parameter UPLO, netlib: 4, mkl: 3

! Absent mkl parameter: JOBZ

! Extra mkl parameter: Z

HEEVX(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])


! netlib: (A,W,JOBZ,UPLO,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)




SYEVR(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,ISUPPZ][,ABSTOL][,INFO])


! netlib: (A,W,JOBZ,UPLO,VL,VU,IL,IU,M,ISUPPZ,ABSTOL,INFO)




HEEVR(A,W[,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,ISUPPZ][,ABSTOL][,INFO])


! netlib: (A,W,JOBZ,UPLO,VL,VU,IL,IU,M,ISUPPZ,ABSTOL,INFO)




GESDD(A,S[,U][,VT][,JOBZ][,INFO])


2989


! netlib: (A,S,U,VT,WW,JOB,INFO)

! Different number for parameter, netlib: 7, mkl: 6

! Absent mkl parameter: WW

! Absent mkl parameter: JOB

! Different order for parameter INFO, netlib: 7, mkl: 6

! Extra mkl parameter: JOBZ

SYGVX(A,B,W[,ITYPE][,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])


! netlib: (A,B,W,ITYPE,JOBZ,UPLO,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)




HEGVX(A,B,W[,ITYPE][,UPLO][,Z][,VL][,VU][,IL][,IU][,M][,IFAIL][,ABSTOL][,INFO])


! netlib: (A,B,W,ITYPE,JOBZ,UPLO,VL,VU,IL,IU,M,IFAIL,ABSTOL,INFO)




GETRS(A,IPIV,B[,TRANS][,INFO])

! Interface netlib95 exists:

! Different intents for parameter A, netlib: INOUT, mkl: IN

2990


Interfaces Absent From NetlibGTTRF(DL,D,DU,DU2[,IPIV][,INFO])

PPTRF(A[,UPLO][,INFO])

PBTRF(A[,UPLO][,INFO])

PTTRF(D,E[,INFO])

SYTRF(A[,UPLO][,IPIV][,INFO])

HETRF(A[,UPLO][,IPIV][,INFO])

SPTRF(A[,UPLO][,IPIV][,INFO])

HPTRF(A[,UPLO][,IPIV][,INFO])

GBTRS(A,B,IPIV[,KL][,TRANS][,INFO])

GTTRS(DL,D,DU,DU2,B,IPIV[,TRANS][,INFO])

POTRS(A,B[,UPLO][,INFO])

PPTRS(A,B[,UPLO][,INFO])

PBTRS(A,B[,UPLO][,INFO])

PTTRS(D,E,B[,INFO])

2991


PTTRS(D,E,B[,UPLO][,INFO])

SYTRS(A,B,IPIV[,UPLO][,INFO])

HETRS(A,B,IPIV[,UPLO][,INFO])

SPTRS(A,B,IPIV[,UPLO][,INFO])

HPTRS(A,B,IPIV[,UPLO][,INFO])

TRTRS(A,B[,UPLO][,TRANS][,DIAG][,INFO])

TPTRS(A,B[,UPLO][,TRANS][,DIAG][,INFO])

TBTRS(A,B[,UPLO][,TRANS][,DIAG][,INFO])

GECON(A,ANORM,RCOND[,NORM][,INFO])

GBCON(A,IPIV,ANORM,RCOND[,KL][,NORM][,INFO])

GTCON(DL,D,DU,DU2,IPIV,ANORM,RCOND[,NORM][,INFO])

POCON(A,ANORM,RCOND[,UPLO][,INFO])

PPCON(A,ANORM,RCOND[,UPLO][,INFO])

PBCON(A,ANORM,RCOND[,UPLO][,INFO])

PTCON(D,E,ANORM,RCOND[,INFO])

SYCON(A,IPIV,ANORM,RCOND[,UPLO][,INFO])

HECON(A,IPIV,ANORM,RCOND[,UPLO][,INFO])

SPCON(A,IPIV,ANORM,RCOND[,UPLO][,INFO])

HPCON(A,IPIV,ANORM,RCOND[,UPLO][,INFO])

TRCON(A,RCOND[,UPLO][,DIAG][,NORM][,INFO])

TPCON(A,RCOND[,UPLO][,DIAG][,NORM][,INFO])

TBCON(A,RCOND[,UPLO][,DIAG][,NORM][,INFO])

GBRFS(A,AF,IPIV,B,X[,KL][,TRANS][,FERR][,BERR][,INFO])

GTRFS(DL,D,DU,DLF,DF,DUF,DU2,IPIV,B,X[,TRANS][,FERR][,BERR][,INFO])

PORFS(A,AF,B,X[,UPLO][,FERR][,BERR][,INFO])

PPRFS(A,AF,B,X[,UPLO][,FERR][,BERR][,INFO])

PBRFS(A,AF,B,X[,UPLO][,FERR][,BERR][,INFO])

PTRFS(D,DF,E,EF,B,X[,FERR][,BERR][,INFO])

2992


PTRFS(D,DF,E,EF,B,X[,UPLO][,FERR][,BERR][,INFO])

SYRFS(A,AF,IPIV,B,X[,UPLO][,FERR][,BERR][,INFO])

HERFS(A,AF,IPIV,B,X[,UPLO][,FERR][,BERR][,INFO])

SPRFS(A,AF,IPIV,B,X[,UPLO][,FERR][,BERR][,INFO])

HPRFS(A,AF,IPIV,B,X[,UPLO][,FERR][,BERR][,INFO])

TRRFS(A,B,X[,UPLO][,TRANS][,DIAG][,FERR][,BERR][,INFO])

TPRFS(A,B,X[,UPLO][,TRANS][,DIAG][,FERR][,BERR][,INFO])

TBRFS(A,B,X[,UPLO][,TRANS][,DIAG][,FERR][,BERR][,INFO])

POTRI(A[,UPLO][,INFO])

PPTRI(A[,UPLO][,INFO])

SYTRI(A,IPIV[,UPLO][,INFO])

HETRI(A,IPIV[,UPLO][,INFO])

SPTRI(A,IPIV[,UPLO][,INFO])

HPTRI(A,IPIV[,UPLO][,INFO])

TRTRI(A[,UPLO][,DIAG][,INFO])

TPTRI(A[,UPLO][,DIAG][,INFO])

GBEQU(A,R,C[,KL][,ROWCND][,COLCND][,AMAX][,INFO])

POEQU(A,S[,SCOND][,AMAX][,INFO])

PPEQU(A,S[,SCOND][,AMAX][,UPLO][,INFO])

PBEQU(A,S[,SCOND][,AMAX][,UPLO][,INFO])

HESV(A,B[,UPLO][,IPIV][,INFO])

HPSV(A,B[,UPLO][,IPIV][,INFO])

GEQRF(A[,TAU][,INFO])

GEQPF(A,JPVT[,TAU][,INFO])

GEQP3(A,JPVT[,TAU][,INFO])

ORGQR(A,TAU[,INFO])

ORMQR(A,TAU,C[,SIDE][,TRANS][,INFO])

UNGQR(A,TAU[,INFO])

2993


UNMQR(A,TAU,C[,SIDE][,TRANS][,INFO])

GELQF(A[,TAU][,INFO])

ORGLQ(A,TAU[,INFO])

ORMLQ(A,TAU,C[,SIDE][,TRANS][,INFO])

UNGLQ(A,TAU[,INFO])

UNMLQ(A,TAU,C[,SIDE][,TRANS][,INFO])

GEQLF(A[,TAU][,INFO])

ORGQL(A,TAU[,INFO])

UNGQL(A,TAU[,INFO])

ORMQL(A,TAU,C[,SIDE][,TRANS][,INFO])

UNMQL(A,TAU,C[,SIDE][,TRANS][,INFO])

GERQF(A[,TAU][,INFO])

ORGRQ(A,TAU[,INFO])

UNGRQ(A,TAU[,INFO])

ORMRQ(A,TAU,C[,SIDE][,TRANS][,INFO])

UNMRQ(A,TAU,C[,SIDE][,TRANS][,INFO])

TZRZF(A[,TAU][,INFO])

ORMRZ(A,TAU,C,L[,SIDE][,TRANS][,INFO])

UNMRZ(A,TAU,C,L[,SIDE][,TRANS][,INFO])

GGQRF(A,B[,TAUA][,TAUB][,INFO])

GGRQF(A,B[,TAUA][,TAUB][,INFO])

GEBRD(A[,D][,E][,TAUQ][,TAUP][,INFO])

GBBRD(A[,C][,D][,E][,Q][,PT][,KL][,M][,INFO])

ORGBR(A,TAU[,VECT][,INFO])

ORMBR(A,TAU,C[,VECT][,SIDE][,TRANS][,INFO])

ORMTR(A,TAU,C[,SIDE][,UPLO][,TRANS][,INFO])

UNGBR(A,TAU[,VECT][,INFO])

UNMBR(A,TAU,C[,VECT][,SIDE][,TRANS][,INFO])

2994


BDSQR(D,E[,VT][,U][,C][,UPLO][,INFO])

BDSDC(D,E[,U][,VT][,Q][,IQ][,UPLO][,INFO])

UNMTR(A,TAU,C[,SIDE][,UPLO][,TRANS][,INFO])

SPTRD(A,TAU[,UPLO][,INFO])

OPGTR(A,TAU,Q[,UPLO][,INFO])

OPMTR(A,TAU,C[,SIDE][,UPLO][,TRANS][,INFO])

HPTRD(A,TAU[,UPLO][,INFO])

UPGTR(A,TAU,Q[,UPLO][,INFO])

UPMTR(A,TAU,C[,SIDE][,UPLO][,TRANS][,INFO])

SBTRD(A[,Q][,VECT][,UPLO][,INFO])

HBTRD(A[,Q][,VECT][,UPLO][,INFO])

STERF(D,E[,INFO])

STEQR(D,E[,Z][,COMPZ][,INFO])

STEDC(D,E[,Z][,COMPZ][,INFO])

STEGR(D,E,W[,Z][,VL][,VU][,IL][,IU][,M][,ISUPPZ][,ABSTOL][,INFO])

PTEQR(D,E[,Z][,COMPZ][,INFO])

STEBZ(D,E,M,NSPLIT,W,IBLOCK,ISPLIT[,ORDER][,VL][,VU][,IL][,IU][,ABSTOL][,INFO])

STEIN(D,E,W,IBLOCK,ISPLIT,Z[,IFAILV][,INFO])

DISNA(D,SEP[,JOB][,MINMN][,INFO])

SPGST(A,B[,ITYPE][,UPLO][,INFO])

HPGST(A,B[,ITYPE][,UPLO][,INFO])

SBGST(A,B[,X][,UPLO][,INFO])

HBGST(A,B[,X][,UPLO][,INFO])

PBSTF(B[,UPLO][,INFO])

GEHRD(A[,TAU][,ILO][,IHI][,INFO])

ORGHR(A,TAU[,ILO][,IHI][,INFO])

ORMHR(A,TAU,C[,ILO][,IHI][,SIDE][,TRANS][,INFO])

UNGHR(A,TAU[,ILO][,IHI][,INFO])

2995


UNMHR(A,TAU,C[,ILO][,IHI][,SIDE][,TRANS][,INFO])

GEBAL(A[,SCALE][,ILO][,IHI][,JOB][,INFO])

GEBAK(V,SCALE[,ILO][,IHI][,JOB][,SIDE][,INFO])

HSEQR(H,WR,WI[,ILO][,IHI][,Z][,JOB][,COMPZ][,INFO])

HSEQR(H,W[,ILO][,IHI][,Z][,JOB][,COMPZ][,INFO])

HSEIN(H,WR,WI,SELECT[,VL][,VR][,IFAILL][,IFAILR][,INITV][,EIGSRC][,M][,INFO])

HSEIN(H,W,SELECT[,VL][,VR][,IFAILL][,IFAILR][,INITV][,EIGSRC][,M][,INFO])

TREVC(T[,HOWMNY][,SELECT][,VL][,VR][,M][,INFO])

TRSNA(T[,S][,SEP][,VL][,VR][,SELECT][,M][,INFO])

TREXC(T,IFST,ILST[,Q][,INFO])

TRSEN(T,SELECT[,WR][,WI][,M][,S][,SEP][,Q][,INFO])

TRSEN(T,SELECT[,W][,M][,S][,SEP][,Q][,INFO])

TRSYL(A,B,C,SCALE[,TRANA][,TRANB][,ISGN][,INFO])

GGHRD(A,B[,ILO][,IHI][,Q][,Z][,COMPQ][,COMPZ][,INFO])

GGBAL(A,B[,ILO][,IHI][,LSCALE][,RSCALE][,JOB][,INFO])

GGBAK(V[,ILO][,IHI][,LSCALE][,RSCALE][,JOB][,INFO])

HGEQZ(H,T[,ILO][,IHI][,ALPHAR][,ALPHAI][,BETA][,Q][,Z][,JOB][,COMPQ][,COMPZ][,INFO])

HGEQZ(H,T[,ILO][,IHI][,ALPHA][,BETA][,Q][,Z][,JOB][,COMPQ][,COMPZ][,INFO])

TGEVC(S,P[,HOWMNY][,SELECT][,VL][,VR][,M][,INFO])

TGEXC(A,B[,IFST][,ILST][,Z][,Q][,INFO])

TGSEN(A,B,SELECT[,ALPHAR][,ALPHAI][,BETA][,IJOB][,Q][,Z][,PL][,PR][,DIF][,M][,INFO])

TGSEN(A,B,SELECT[,ALPHA][,BETA][,IJOB][,Q][,Z][,PL][,PR][,DIF][,M][,INFO])

TGSYL(A,B,C,D,E,F[,IJOB][,TRANS][,SCALE][,DIF][,INFO])

TGSNA(A,B[,S][,DIF][,VL][,VR][,SELECT][,M][,INFO])

GGSVP(A,B,TOLA,TOLB[,K][,L][,U][,V][,Q][,INFO])

TGSJA(A,B,TOLA,TOLB,K,L[,U][,V][,Q][,JOBU][,JOBV][,JOBQ][,ALPHA][,BETA][,NCYCLE][,INFO])

2996


Interfaces of New FunctionalityGETRF(A[,IPIV][,INFO])


! netlib: (A,IPIV,RCOND,NORM,INFO)



! Absent mkl parameter: NORM

! Absent mkl parameter: RCOND

GBTRF(A[,KL][,M][,IPIV][,INFO])


! netlib: (A,K,M,IPIV,RCOND,NORM,INFO)




! Replace parameter name: netlib: K: mkl: KL


POTRF(A[,UPLO][,INFO])


! netlib: (A,UPLO,RCOND,NORM,INFO)





2997


FOptimization Solvers Basics

Classical optimization methods are linear methods that are based on calculation of direction and searchingfor the best point in the direction of the chosen gradient. Direction is calculated on each iteration. Examplesof these methods are Conjugate Gradients method and Quickest Descent Method. Method of searchingfor direction usually requires calculation of subtask that approximates the objective function in neighborhoodof the current point.

Optimization problems are generally made up of three basic components:

• An objective function that needs to be minimized or maximized. For instance, in a manufacturingprocess, you might want to maximize the profit or minimize the cost. In fitting experimental data toa user-defined model, the total deviation of observed data can be minimized from predictions basedon the model.

• A set of unknowns or variables that affect the value of the objective function. In the manufacturingproblem, the variables might include the amounts of different resources used or the time spent oneach activity. In fitting-the-data problem, the unknowns are the parameters that define the model.

• A set of constraints that allow the unknowns to take on certain values but exclude others. For themanufacturing problem, it does not make sense to spend a negative amount of time on any activity,so all the "time" variables are constrained to be non-negative.

So the optimization problem can be formulated as follows: Find values of the variables that minimizeor maximize the objective function while satisfying the constraints.

Intel® MKL provides math optimization tools, namely the Trust-Region (TR) solvers routines for solvingnonlinear least squares problem with or without linear (bound) constraints.

Nonlinear Least Square ProblemNonlinear least squares problem can be described as follows:

where F(x) : Rn → R is a twice differentiable function in Rn. Solving of nonlinear least squaresproblem is searching for the best approximation to vector y with the model function that has nonlineardependence on variables Ñ . Best approximation means that the sum of squares of residuals is thelowest possible.

2999

Let us designate

so

where R(x)={ri(x)}, i = 1, ... m.

Trust Region AlgorithmThe Trust Region (TR) algorithms are relatively new iterative algorithms for solving nonlinearoptimization problems. They are widely used in power engineering, finance, applied mathematics,physics, computer science, economics, sociology, biology, medicine, mechanical engineering,chemistry, and other areas. TR methods have global convergence and local super convergence,which differenciates them from line search methods and Newton methods. TR methods havebetter convergence when compared with widely-used Newton-type methods.

The main idea behind TR algorithm is calculating a trial step and checking if the next values ofх+ belong to the trust region [Conn00]. Calculation of the trial step is strongly associatedwith the approximation model.

If the nonlinear least squares problem is defined as

then the trial step is solution of the following subproblem:

3000

F Intel® Math Kernel Library Reference Manual

This operation is approximation of the objective function in neighborhood of the current pointx.

Let us consider a TR algorithm with boundary (linear) constraints :

The first-order necessary conditions for the vector to be minimized with this constraint arestated as

where D is the diagonal scaling matrix

with

for i = 1, ..., n.

Assume Ω ={x ∈ Rn: l ≤ x ≤ u} ⊂ Rn, and write int(Ω) for the strict nonempty interior

of Ω. At each iteration, the basic structure of the method involves solution of an ellipticaltrust-region subproblem and computation of a search step to update the current iteratation.

3001

Optimization Solvers Basics F

Assume xi ∈ int(Ω) and the trust region size Δk > 0. Consider the elliptical trust-regionsubproblem given by

Let ptr(Δk) and pc(Δk) be a solution and a Cauchy point of the above trust-region subproblemrespectively [Conn00]. The search step (trial-step) used in the algorithm is defined by a linearcombination of two vectors ptr(Δk) and pc(Δk), that is,

A good agreement between the model function mk and the objective function f is ensured forthe following standard condition:

for the given constant β2 ∈ (0,1). If this condition is not met, reject p(Δk) and the trust-region

size Δk is adjusted to be successive reduction so that p(Δk) satisfies the accuracy requirement.

3002

F Intel® Math Kernel Library Reference Manual

BibliographyFor more information about the BLAS, Sparse BLAS, LAPACK, ScaLAPACK, Sparse Solver, IntervalSolver, VML, VSL, and DFT functionality, refer to the following publications:

• BLAS Level 1

C. Lawson, R. Hanson, D. Kincaid, and F. Krough. Basic Linear Algebra Subprograms for FortranUsage, ACM Transactions on Mathematical Software, Vol.5, No.3 (September 1979) 308-325.

• BLAS Level 2

J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson. An Extended Set of Fortran Basic LinearAlgebra Subprograms, ACM Transactions on Mathematical Software, Vol.14, No.1 (March 1988)1-32.

• BLAS Level 3

J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling. A Set of Level 3 Basic Linear AlgebraSubprograms, ACM Transactions on Mathematical Software (December 1989).

• Sparse BLAS

D. Dodson, R. Grimes, and J. Lewis. Sparse Extensions to the FORTRAN Basic Linear AlgebraSubprograms, ACM Transactions on Math Software, Vol.17, No.2 (June 1991).

D. Dodson, R. Grimes, and J. Lewis. Algorithm 692: Model Implementation and Test Package forthe Sparse Basic Linear Algebra Subprograms, ACM Transactions on Mathematical Software,Vol.17, No.2 (June 1991).

I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrices.Clarendon Press, Oxford, UK, 1986.

[Duff86]

Compaq Extended Math Library. Reference Guide, Oct.2001.[CXML01]

K.Remington. A NIST FORTRAN Sparse Blas User's Guide. (available onhttp://math.nist.gov/~KRemington/fspblas/)

[Rem05]

Y.Saad. SPARSKIT: A Basic Tool-kit for Sparse Matrix Computation.Version 2, 1994.(http://www.cs.umn.edu/~saad)

[Saad94]

Y.Saad. Iterative Methods for Linear Systems. PWS Publishing, Boston,1996.

[Saad96]

• LAPACK

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D.Sorensen. LAPACK Users' Guide, Third Edition, Society for Industrial andApplied Mathematics (SIAM), 1999.

[LUG]

G. Golub and C. Van Loan. Matrix Computations, Johns Hopkins UniversityPress, Baltimore, third edition,1996.

[Golub96]

http://citeseer.ist.psu.edu/bischof92framework.html[Bischof92]

3003

O.Marques, E.J.Riedy, and Ch.Voemel. Benefits of IEEE-754 Featuresin Modern Symmetric Tridiagonal Eigensolvers, SIAM Journal onScientific Computing, Vol.28, No.5, 2006. (Tech report version inLAPACK Working Note 172 with the same title.)

[Marques06]

W. Kahan. Accurate Eigenvalues of a Symmetric Tridiagonal Matrix,Report CS41, Computer Science Dept., Stanford University, July 21,1966.

[Kahan66]

I. Dhillon, B. Parlett. Multiple representations to compute orthogonaleigenvectors of symmetric tridiagonal matrices, Linear Algebra andits Applications, 387(1), pp. 1-28, August 2004.

[Dhillon04]

I. Dhillon, B. Parlett. Orthogonal Eigenvectors and * Relative Gaps,SIAM Journal on Matrix Analysis and Applications, Vol. 25, 2004.(Also LAPACK Working Note 154.)

[Dhillon04-02]

I. Dhillon. A new O(n^2) algorithm for the symmetric tridiagonaleigenvalue/eigenvector problem, Computer Science Division TechnicalReport No. UCB/CSD-97-971, UC Berkeley, May 1997.

[Dhillon97]

• ScaLAPACK

L. Blackford, J. Choi, A.Cleary, E. D'Azevedo, J. Demmel, I. Dhillon,J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K.Stanley, D.Walker, and R. Whaley. ScaLAPACK Users' Guide, Society forIndustrial and Applied Mathematics (SIAM), 1997.

[SLUG]

• Sparse Solver

I. S. Duff and J. Koster. The Design and Use of Algorithms forPermuting Large Entries to the Diagonal of Sparse Matrices. SIAM J.Matrix Analysis and Applications, 20(4):889-901, 1999.

[Duff99]

J. Dongarra, V.Eijkhout, A.Kalhan. Reverse Communication Interfacefor Linear Algebra Templates for Iterative Methods. UT-CS-95-291,May 1995. http://www.netlib.org/lapack/lawnspdf/lawn99.pdf

[Dong95]

G. Karypis and V. Kumar. A Fast and High Quality Multilevel Schemefor Partitioning Irregular Graphs. SIAM Journal on ScientificComputing, 20(1):359-392, 1998.

[Karypis98]

X.S. Li and J.W. Demmel. A Scalable Sparse Direct Solver UsingStatic Pivoting. In Proceeding of the 9th SIAM conference on ParallelProcessing for Scientific Computing, San Antonio, Texas, March22-34,1999.

[Li99]

J.W.H. Liu. Modification of the Minimum-Degree algorithm by multipleelimination. ACM Transactions on Mathematical Software,11(2):141-153, 1985.

[Liu85]

3004


R. Menon L. Dagnum. OpenMP: An Industry-Standard API forShared-Memory Programming. IEEE Computational Science &Engineering, 1:46-55, 1998. http://www.openmp.org.

[Menon98]

Y. Saad. Iterative Methods for Sparse Linear Systems. 2nd edition,SIAM, Philadelphia, PA, 2003.

[Saad03]

O. Schenk. Scalable Parallel Sparse LU Factorization Methods onShared Memory Multiprocessors. PhD thesis, ETH Zurich, 2000.

[Schenk00]

O. Schenk, K. Gartner, and W. Fichtner. Efficient Sparse LUFactorization with Left-right Looking Strategy on Shared MemoryMultiprocessors. BIT, 40(1):158-176, 2000.

[Schenk00-2]

O. Schenk and K. Gartner. Sparse Factorization with Two-LevelScheduling in PARDISO. In Proceeding of the 10th SIAM conferenceon Parallel Processing for Scientific Computing, Portsmouth, Virginia,March 12-14, 2001.

[Schenk01]

O. Schenk and K. Gartner. Two-level scheduling in PARDISO:Improved Scalability on Shared Memory Multiprocessing Systems.Parallel Computing, 28:187-197, 2002.

[Schenk02]

O. Schenk and K. Gartner. Solving Unsymmetric Sparse Systems ofLinear Equations with PARDISO. Journal of Future GenerationComputer Systems, 20(3):475-487, 2004.

[Schenk03]

O. Schenk and K. Gartner. On Fast Factorization Pivoting Methodsfor Sparse Symmetric Indefinite Systems. Technical Report,Department of Computer Science, University of Basel, 2004,submitted.

[Schenk04]

P. Sonneveld. CGS, a Fast Lanczos-Type Solver for NonsymmetricLinear Systems. SIAM Journal on Scientific and Statistical Computing,10:36-52, 1989.

[Sonn89]

D.M.Young. Iterative Solution of Large Linear Systems. New York,Academic Press, Inc., 1971.

[Young71]

• VSL

Document included with Intel® MKL product (file name vslnotes.pdf).[VSL Notes]

Bratley P., Fox B.L., and Schrage L.E. A Guide to Simulation. 2ndedition. Springer-Verlag, New York, 1987.

[Bratley87]

Bratley P. and Fox B.L. Implementing Sobol`s Quasirandom SequenceGenerator, ACM Transactions on Mathematical Software, Vol. 14, No.1, Pages 88-100, March 1988.

[Bratley88]

Bratley P., Fox B.L., and Niederreiter H. Implementation and Testsof Low-Discrepancy Sequences, ACM Transactions on Modeling andComputer Simulation, Vol. 2, No. 3, Pages 195-213, July 1992.

[Bratley92]

3005

Bibliography

Coddington, P. D. Analysis of Random Number Generators UsingMonte Carlo Simulation. Int. J. Mod. Phys. C-5, 547, 1994.

[Coddington94]

Gentle, James E. Random Number Generation and Monte CarloMethods, Springer-Verlag New York, Inc., 1998.

[Gentle98]

L'Ecuyer, Pierre. Uniform Random Number Generation. Annals ofOperations Research, 53, 77-120, 1994.

[L'Ecuyer94]

L'Ecuyer, Pierre. Tables of Linear Congruential Generators of DifferentSizes and Good Lattice Structure. Mathematics of Computation, 68,225, 249-260, 1999.

[L'Ecuyer99]

L'Ecuyer, Pierre. Good Parameter Sets for Combined MultipleRecursive Random Number Generators. Operations Research, 47, 1,159-164, 1999.

[L'Ecuyer99a]

L'Ecuyer, Pierre. Software for Uniform Random Number Generation:Distinguishing the Good and the Bad. Proceedings of the 2001 WinterSimulation Conference, IEEE Press, 95-105, Dec. 2001.

[L'Ecuyer01]

Kirkpatrick, S., and Stoll, E. A Very Fast Shift-Register SequenceRandom Number Generator. Journal of Computational Physics, V.40. 517-526, 1981.

[Kirkpatrick81]

Knuth, Donald E. The Art of Computer Programming, Volume 2,Seminumerical Algorithms. 2nd edition, Addison-Wesley PublishingCompany, Reading, Massachusetts, 1981.

[Knuth81]

Matsumoto, M., and Nishimura, T. Mersenne Twister: A623-Dimensionally Equidistributed Uniform Pseudo-Random NumberGenerator, ACM Transactions on Modeling and Computer Simulation,Vol. 8, No. 1, Pages 3-30, January 1998.

[Matsumoto98]

Matsumoto, M., and Nishimura, T. Dynamic Creation of PseudorandomNumber Generators, 56-69, in: Monte Carlo and Quasi-Monte CarloMethods 1998, Ed. Niederreiter, H. and Spanier, J., Springer 2000,http://www.math.sci.hiroshima-u.ac.jp/%7Em-mat/MT/DC/dc.html.

[Matsumoto00]

NAG Numerical Libraries.http://www.nag.co.uk/numeric/numerical_libraries.asp

[NAG]

Sobol, I.M., and Levitan, Yu.L. The production of points uniformlydistributed in a multidimensional cube. Preprint 40, Institute ofApplied Mathematics, USSR Academy of Sciences, 1976 (In Russian).

[Sobol76]

• DFT

E. Oran Brigham, The Fast Fourier Transform and Its Applications,Prentice Hall, New Jersey, 1988.

[1]

Athanasios Papoulis, The Fourier Integral and its Applications, 2ndedition, McGraw-Hill, New York, 1984.

[2]

3006


Ping Tak Peter Tang, DFTI, a New API for DFT: Motivation, Design,and Rationale, July 2002.

[3]

Charles Van Loan, Computational Frameworks for the Fast FourierTransform, SIAM, Philadelphia, 1992

[4]

• VML

J.M.Muller. Elementary functions: algorithms and implementation, Birkhauser Boston, 1997.

IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985.

• Interval Solver

G. Alefeld and J. Herzberger, Introduction to Interval Computations.- Academic Press, New York, 1983.

[Alefeld83]

A.H.Bentbib, Solving the full rank interval least squares problem //Applied Numerical Mathematics. - 2002. - Vol. 41. - P. 283-294.

[Bentbib02]

Ch. Bliek, Computer methods for design automation, Ph.D. Thesis.- Dept. of Ocean Engineering, Massachusetts Institute of Technology,1992.

[Bliek92]

R. Hammer, M. Hocks, U. Kulisch, D. Ratz, C++ Toolbox for VerifiedComputing I. Basic Numerical Problems. - Berlin-Heidelberg: Springer,1995.

[Hammer95]

E. Hansen, Bounding the solution of interval linear equations // SIAMJournal on Numerical Analysis. - 1992. - Vol. 29, No. 5. - P.1493-1503.

[Hansen92]

J. Herzberger, Iterative methods for the inclusion of the inverse ofa matrix // Topics in Validated Computations, J. Herzberger, ed. -Amsterdam: Elsevier, 1994. -

[Herzberger94]

Ch.Jansson, Interval linear systems with symmetric matrices,skew-symmetric matrices, and dependencies in the right hand side// Computing. - 1991. - Vol. 46. - P. 265 - 274.

[Jansson91]

R.B. Kearfott, Rigorous Global Search: Continuous Problems. -Dordrecht, Kluwer, 1996.

[Kearfott96]

R.B. Kearfott, M.T. Nakao, A. Neumaier, S.M. Rump, S.P. Shary, P.van Hentenryck, Standardized notation in interval analysis. - Anelectronic version of the paper is accessible athttp://www.mat.univie.ac.at/~neum/software/int/

[Kearfott]

V. Kreinovich, A. Lakeyev, J. Rohn, P. Kahl, Computational Complexityand Feasibility of Data Processing and Interval Computations. -Kluwer, Dordrecht, 1997.

[Kreinovich97]

A. Neumaier, Interval Methods for Systems of Equations. - Cambridge,Cambridge University Press, 1990.

[Neumaier90]

3007

Bibliography

A.Neumaier, A simple derivation of Hansen-Bliek-Rohn-Ning-Kearfottenclosure for linear interval equations // Reliable Computing. - 1999.- Vol. 5, No. 2. - P. 131-136.

[Neumaier99]

S. Ning, R.B. Kearfott, A comparison of some methods for solvinglinear interval equations // SIAM Journal on Numerical Analysis. -1997. - Vol. 34, No. 4. - P. 1289-1305.

[Ning97]

G.Rex, J.Rohn, Sufficient conditions for regularity and singularity ofinterval matrices // SIAM Journal on Numerical Analysis. - 1999. -Vol. 20. - P. 437-445.

[Rex99]

J. Rohn, Cheap and tight bounds: the recent result by E. Hansen canbe made more efficient // Interval Computations. - 1993. - No. 4. -P. 13-21.

[Rohn93]

S. M.Rump, Solving algebraic problems with high accuracy // A NewApproach to Scientific Computation; Kulisch U. W. and Miranker W.L., eds. - New York: Academic Press, 1983. - P. 51-120.

[Rump83]

S. M.Rump, Solution of linear and nonlinear algebraic problems withsharp guaranteed bounds // Computing Supplement. - 1984. - Vol.5. - P. 147-168.

[Rump84]

S. M.Rump, Kaucher E. Small bounds for the solution of systems oflinear equations // Computing Supplement. - 1980. - Vol. 2. - P.157-164.

[Rump80]

S. Rump, INTLAB -- INTerval LABoratory. - 21 p. - An electronicversion of the paper is accessible at http://www.ti3.tu-harburg.de/rump/intlab/.

[Rump]

S.P. Shary, A new class of algorithms for optimal solution of intervallinear systems // Interval Computations. - 1992, No. 2(4). - P. 18-29.

[Shary92]

S.P. Shary, On optimal solution of interval linear equations // SIAMJournal on Numerical Analysis. - 1995. - Vol. 32, No. 2. - P. 610-630.

[Shary95]

S. P. Shary, A new technique in systems analysis under intervaluncertainty and ambiguity // Reliable Computing. - 2002. - Vol. 8. -321-419.

[Shary02]

S.P. Shary, A new class of methods for optimal enclosing solutionsets to interval linear systems // Journal of ComputationalMathematics. - to be published.

[Shary]

• Optimization Solvers

A. R. Conn, N. I.M. Gould, P. L. Toint.Trust-region Methods.SIAMSociety for Industrial & Applied Mathematics, Englewood Cliffs, NewJersey, MPS-SIAM Series on Optimization edition, 2000.

[Conn00]

J. Dongara, V. Eijkhout, A. Kalhan.Reverse communication interfacefor linear algebra templates for iterative methods.1995.

[Dong95]

3008


For a reference implementation of BLAS, sparse BLAS, LAPACK, and ScaLAPACK packages(without platform-specific optimizations) visit www.netlib.org

3009

Bibliography

GlossaryDenotes the conjugate of a general matrix A. See also conjugate matrix.AH

Denotes the transpose of a general matrix A. See also transpose.AT

A general m-by-n matrix A such that aij = 0 for |i - j| > l, where 1< l < min(m, n). For example, any tridiagonal matrix is a band matrix.

band matrix

A special storage scheme for band matrices. A matrix is stored in atwo-dimensional array: columns of the matrix are stored in thecorresponding columns of the array, and diagonals of the matrix arestored in rows of the array.

band storage

Abbreviation for Basic Linear Algebra Subprograms. These subprogramsimplement vector, matrix-vector, and matrix-matrix operations.

BLAS

Abbreviation for Basic Random Number Generator. Basic random numbergenerators are pseudorandom number generators imitating i.i.d. randomnumber sequences of uniform distribution. Distributions other than uniformare generated by applying different transformation techniques to thesequences of random numbers of uniform distribution.

BRNG

Standardized mechanism that allows a user to include a user-designedBRNG into the VSL and use it along with the predefined VSL basicgenerators.

BRNG registration

Representation of a real symmetric or complex Hermitian matrix A in theform A = PUDUHPT (or A = PLDLHPT) where P is a permutation matrix, Uand L are upper and lower triangular matrices with unit diagonal, and D is

Bunch-Kaufmanfactorization

a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks.U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocksof D.

When found as the first letter of routine names, c indicates the usage ofsingle-precision complex data type.

c

C interface to the BLAS. See BLAS.CBLASCumulative Distribution Function. The function that determines probabilitydistribution for univariate or multivariate random variable X. For univariatedistribution the cumulative distribution function is the function of real

CDF

argument x, which for every x takes a value equal to probability of the

event A: X ≤ x. For multivariate distribution the cumulative distributionfunction is the function of a real vector x = (x1,x2, ..., xn), which, for

every x, takes a value equal to probability of the event A = (X1 ≤ x1 &

X2 ≤ x2, & ..., & Xn ≤ xn).

3011

Representation of a symmetric positive-definite or, for complex data,Hermitian positive-definite matrix A in the form A = UHU or A = LLH,where L is a lower triangular matrix and U is an upper triangular matrix.

Choleskyfactorization

The number κ(A) defined for a given square matrix A as follows: κ(A)= ||A|| ||A−1||.

condition number

The matrix AH defined for a given general matrix A as follows: (AH)ij= (aji)

*.conjugate matrix

The conjugate of a complex number z = a + bi is z* = a - bi.conjugate numberWhen found as the first letter of routine names, d indicates the usageof double-precision real data type.

d

Abbreviation for Discrete Fourier Transforms. See Chapter 11 of thisbook.

DFTs

The number denoted x · y and defined for given vectors x and y as

follows: x · y = Σi xiyi.

dot product

Here xi and yi stand for the i-th elements of x and y, respectively.

A floating-point data type. On Intel® processors, this data type allowsyou to store real numbers x such that 2.23*10−308< | x | <

1.79*10308. For this data type, the machine precision ε is approximately

double precision

10−15, which means that double-precision numbers usually contain nomore than 15 significant decimal digits.For more information, refer toIntel® 64 and IA-32 Architectures Software Developer's Manual, Volume1: Basic Architecture.

See eigenvalue problem.eigenvalue

A problem of finding non-zero vectors x and numbers λ (for a given

square matrix A) such that Ax = λx. Here the numbers λ are calledthe eigenvalues of the matrix A and the vectors x are called theeigenvectors of the matrix A.

eigenvalue problem

See eigenvalue problem.eigenvector

Matrix of a general form H = I − τvvT, where v is a column vector

and τ is a scalar. In LAPACK elementary reflectors are used, for example,to represent the matrix Q in the QR factorization (the matrix Q isrepresented as a product of elementary reflectors).

elementaryreflector(Householdermatrix)

Representation of a matrix as a product of matrices. See alsoBunch-Kaufman factorization, Cholesky factorization, LU factorization,LQ factorization, QR factorization, Schur factorization.

factorization

Abbreviation for Fast Fourier Transforms. See Chapter 11 of this book.FFTs

3012

A storage scheme allowing you to store matrices of any kind. A matrixA is stored in a two-dimensional array a, with the matrix element aijstored in the array element a(i,j).

full storage

A square matrix A that is equal to its conjugate matrix AH. The conjugateAH is defined as follows: (AH)ij = (aji)

*.Hermitian matrix

See identity matrix.IA square matrix I whose diagonal elements are 1, and off-diagonalelements are 0. For any matrix A, AI = A and IA = A.

identity matrix

Independent Identically Distributed.i.i.d.Qualifier of an operation. A function that performs its operation in-placetakes its input from an array and returns its output to the same array.

in-place

Abbreviation for Intel® Math Kernel Library.Intel MKLThe matrix denoted as A−1 and defined for a given square matrix A asfollows: AA−1 = A−1A = I. A−1 does not exist for singular matrices A.

inverse matrix

Representation of an m-by-n matrix A as A = LQ or A = (L 0)Q. Here

Q is an n-by-n orthogonal (unitary) matrix. For m ≤ n, L is an m-by-mlower triangular matrix with real diagonal elements; for m > n,

LQ factorization

where L1 is an n-by-n lower triangular matrix, and L2 is a rectangularmatrix.

Representation of a general m-by-n matrix A as A = PLU, where P is apermutation matrix, L is lower triangular with unit diagonal elements(lower trapezoidal if m > n) and U is upper triangular (upper trapezoidalif m < n).

LU factorization

The number ε determining the precision of the machine representationof real numbers. For Intel® architecture, the machine precision isapproximately 10−7 for single-precision data, and approximately 10−15

machine precision

for double-precision data. The precision also determines the numberof significant decimal digits in the machine representation of realnumbers. See also double precision and single precision.

Message Passing Interface. This standard defines the user interfaceand functionality for a wide range of message-passing capabilities inparallel computing.

MPI

3013

Glossary

A freely available, portable implementation of MPI standard formessage-passing libraries.

MPICH

A real square matrix A whose transpose and inverse are equal, that is,AT = A-1, and therefore AAT = ATA = I. All eigenvalues of anorthogonal matrix have the absolute value 1.

orthogonal matrix

A storage scheme allowing you to store symmetric, Hermitian, ortriangular matrices more compactly. The upper or lower triangle of amatrix is packed by columns in a one-dimensional array.

packed storage

Probability Density Function. The function that determines probabilitydistribution for univariate or multivariate continuous random variableX. The probability density function f(x) is closely related with thecumulative distribution function F(x). For univariate distribution therelation is

PDF

For multivariate distribution the relation is

A square matrix A such that Ax · x > 0 for any non-zero vector x.Here · denotes the dot product.

positive-definitematrix

A completely deterministic algorithm that imitates truly randomsequences.

pseudorandomnumber generator

Representation of an m-by-n matrix A as A = QR, where Q is an m-by-morthogonal (unitary) matrix, and R is n-by-n upper triangular with real

diagonal elements (if m ≥ n) or trapezoidal (if m < n) matrix.

QR factorization

An abstract source of independent identically distributed randomnumbers of uniform distribution. In this manual a random stream pointsto a structure that uniquely defines a random number sequencegenerated by a basic generator associated with a given random stream.

random stream

3014

Abbreviation for Random Number Generator. In this manual the term"random number generators" stands for pseudorandom numbergenerators, that is, generators based on completely deterministicalgorithms imitating truly random sequences.

RNG

When found as the first letter of routine names, s indicates the usageof single-precision real data type.

s

Stands for Scalable Linear Algebra PACKage.ScaLAPACKRepresentation of a square matrix A in the form A = ZTZH. Here T isan upper quasi-triangular matrix (for complex A, triangular matrix)called the Schur form of A; the matrix Z is orthogonal (for complex A,unitary). Columns of Z are called Schur vectors.

Schur factorization

A floating-point data type. On Intel® processors, this data type allowsyou to store real numbers x such that 1.18*10−38 < | x | <

3.40*1038. For this data type, the machine precision (ε) is

single precision

approximately 10−7, which means that single-precision numbers usuallycontain no more than 7 significant decimal digits. For more information,refer to Intel® 64 and IA-32 Architectures Software Developer's Manual,Volume 1: Basic Architecture.

A matrix whose determinant is zero. If A is a singular matrix, the inverseA-1 does not exist, and the system of equations Ax = b does not havea unique solution (that is, there exist no solutions or an infinite numberof solutions).

singular matrix

The numbers defined for a given general matrix A as the eigenvaluesof the matrix AAH. See also SVD.

singular value

Abbreviation for Symmetric MultiProcessing. The MKL offers performancegains through parallelism provided by the SMP feature.

SMP

Routines performing basic vector operations on sparse vectors. SparseBLAS routines take advantage of vectors' sparsity: they allow you tostore only non-zero elements of vectors. See BLAS.

sparse BLAS

Vectors in which most of the components are zeros.sparse vectorsThe way of storing matrices. See full storage, packed storage, and bandstorage.

storage scheme

Abbreviation for Singular Value Decomposition. See also Singular valuedecomposition section in Chapter 5.

SVD

A square matrix A such that aij = aji.symmetric matrixThe transpose of a given matrix A is a matrix AT such that (AT)ij =aji (rows of A become columns of AT, and columns of A become rowsof AT).

transpose

3015

Glossary

A matrix A such that A = (A1A2), where A1 is an upper triangularmatrix, A2 is a rectangular matrix.

trapezoidal matrix

A matrix A is called an upper (lower) triangular matrix if all itssubdiagonal elements (superdiagonal elements) are zeros. Thus, foran upper triangular matrix aij = 0 when i > j; for a lower triangularmatrix aij = 0 when i < j.

triangular matrix

A matrix whose non-zero elements are in three diagonals only: theleading diagonal, the first subdiagonal, and the first super-diagonal.

tridiagonal matrix

A complex square matrix A whose conjugate and inverse are equal,that is, that is, AH = A-1, and therefore AAH = AHA = I. All eigenvaluesof a unitary matrix have the absolute value 1.

unitary matrix

Abbreviation for Vector Mathematical Library. See Chapter 9 of thisbook.

VML

Abbreviation for Vector Statistical Library. See Chapter 10 of this book.VSLWhen found as the first letter of routine names, z indicates the usageof double-precision complex data type.

z

3016

Index?_backward_trig_transform 2568?_commit_Helmholtz_2D 2595?_commit_Helmholtz_3D 2595?_commit_sph_np 2611?_commit_sph_p 2611?_commit_trig_transform 2563?_forward_trig_transform 2566?_Helmholtz_2D 2601?_Helmholtz_3D 2601?_init_Helmholtz_2D 2592?_init_Helmholtz_3D 2592?_init_sph_np 2608?_init_sph_p 2608?_init_trig_transform 2562?_sph_np 2613?_sph_p 2613?asum 50?axpy 51?axpyi 179?bdsdc 672?bdsqr 668?ConvExec 2423?ConvExec1D 2425?ConvExecX 2428?ConvExecX1D 2431?ConvNewTask 2403?ConvNewTask1D 2406?ConvNewTaskX 2408?copy 53?CorrExec 2423?CorrExec1D 2425?CorrExecX 2428?CorrExecX1D 2431?CorrNewTask 2403

?CorrNewTask1D 2406?CorrNewTaskX 2408?CorrNewTaskX1D 2411?dbtf2 2101?dbtrf 2103?disna 754?dot 54?dotc 57?dotci 182?doti 181?dotu 59?dotui 184?dttrf 2105?dttrsv 2106?gbbrd 650?gbcon 365?gbequ 460?gbmv 77?gbrfs 400?gbsv 481?gbsvx 484?gbtf2 1199?gbtrf 300?gbtrs 328?gebak 798?gebal 794?gebd2 1201?gebrd 646?gecon 363?geequ 458?gees 1016?geesx 1022?geev 1028?geevx 1033

3017

?gegas 2531?gegss 2537?gehbs 2539?gehd2 1203?gehrd 779?gehss 2533?gekws 2535?gelq2 1205?gelqf 585?gels 892?gelsd 905?gelss 901?gelsy 896?gemip 2553?gemm 145?gemv 81?gepps 2540?gepss 2543?geql2 1207?geqlf 599?geqp3 570?geqpf 567?geqr2 1209?geqrf 563?ger 84?gerbr 2549?gerc 86?gerfs 397?gerq2 1210?gerqf 613?geru 88?gesc2 1212?gesdd 1046?gesv 471?gesvd 1041?gesvr 2551?gesvx 475?geszi 2548?getc2 1213?getf2 1215?getrf 297?getri 439?getrs 325?ggbak 841?ggbal 839?gges 1137?ggesx 1144?ggev 1153?ggevx 1159

?ggglm 914?gghrd 835?gglse 910?ggqrf 635?ggrqf 639?ggsvd 1051?ggsvp 879?gtcon 368?gthr 185?gthrz 187?gtrfs 403?gtsv 491?gtsvx 493?gttrf 302?gttrs 331?gtts2 1216?hbev 979?hbevd 986?hbevx 995?hbgst 769?hbgv 1114?hbgvd 1121?hbgvx 1131?hbtrd 718?hecon 382?heev 921?heevd 928?heevr 948?heevx 937?heft2 1557?hegst 759?hegv 1062?hegvd 1069?hegvx 1080?hemm 149?hemv 93?her 96?her2 98?her2k 155?herdb 686?herfs 421?herk 152?hesv 535?hesvx 538?hetrd 694?hetrf 316?hetri 448?hetrs 346?hgeqz 844

3018


?hpcon 386?hpev 957?hpevd 963?hpevx 972?hpgst 764?hpgv 1089?hpgvd 1096?hpgvx 1106?hpmv 101?hpr 103?hpr2 105?hprfs 427?hpsv 550?hpsvx 552?hptrd 709?hptrf 322?hptri 452?hptrs 352?hsein 806?hseqr 800?isnan 1218?labrd 1219?lacgv 1184?lacn2 1222?lacon 1224?lacpy 1225?lacrm 1185?lacrt 1186?ladiv 1227?lae2 1228?laebz 1229?laed0 1234?laed1 1237?laed2 1239?laed3 1242?laed4 1244?laed5 1246?laed6 1247?laed7 1249?laed8 1253?laed9 1256?laeda 1258?laein 1260?laesy 1187?laev2 1263?laexc 1265?lag2 1267?lags2 1269?lagtf 1271

?lagtm 1273?lagts 1275?lagv2 1277?lahef 1498?lahqr 1280?lahr2 1286?lahrd 1283?laic1 1289?laisnan 1218?laln2 1291?lals0 1295?lalsa 1299?lalsd 1303?lamc1 1583?lamc2 1584?lamc3 1585?lamc4 1585?lamc5 1586?lamch 1582?lamrg 1306?lamsh 2092?laneg 1307?langb 1308?lange 1310?langt 1311?lanhb 1316?lanhe 1325?lanhp 1320?lanhs 1313?lansb 1314?lansp 1318?lanst/?lanht 1322?lansy 1323?lantb 1327?lantp 1329?lantr 1331?lanv2 1333?lapll 1334?lapmt 1335?lapy2 1337?lapy3 1337?laqgb 1338?laqge 1340?laqhb 1342?laqp2 1344?laqps 1346?laqr0 1348?laqr1 1352?laqr2 1354

3019

Index

?laqr3 1358?laqr4 1362?laqr5 1366?laqsb 1370?laqsp 1372?laqsy 1374?laqtr 1375?lar1v 1378?lar2v 1381?laref 2094?larf 1383?larfb 1385?larfg 1387?larft 1389?larfx 1392?largv 1393?larnv 1395?larra 1396?larrb 1398?larrc 1400?larrd 1402?larre 1406?larrf 1410?larrj 1413?larrk 1415?larrr 1416?larrv 1418?lartg 1422?lartv 1424?laruv 1425?larz 1426?larzb 1428?larzt 1431?las2 1434?lascl 1436?lasd0 1437?lasd1 1439?lasd2 1443?lasd3 1447?lasd4 1450?lasd5 1452?lasd6 1453?lasd7 1458?lasd8 1462?lasd9 1464?lasda 1467?lasdq 1471?lasdt 1474?laset 1475

?lasorte 2096?lasq1 1476?lasq2 1477?lasq3 1479?lasq4 1480?lasq5 1481?lasq6 1483?lasr 1484?lasrt 1488?lasrt2 2097?lassq 1489?lasv2 1491?laswp 1492?lasy2 1494?lasyf 1496?latbs 1501?latdf 1503?latps 1506?latrd 1508?latrs 1512?latrz 1516?lauu2 1519?lauum 1520?lazq3 1521?lazq4 1524?nrm2 60?opgtr 704?opmtr 706?org2l/?ung2l 1526?org2r/?ung2r 1527?orgbr 653?orghr 781?orgl2/?ungl2 1529?orglq 588?orgql 602?orgqr 573?orgr2/?ungr2 1531?orgrq 616?orgtr 689?orm2l/?unm2l 1532?orm2r/?unm2r 1535?ormbr 657?ormhr 784?orml2/?unml2 1537?ormlq 591?ormql 607?ormqr 576?ormr2/?unmr2 1540?ormr3/?unmr3 1542

3020


?ormrq 620?ormrz 629?ormtr 691?pbcon 375?pbequ 467?pbrfs 412?pbstf 772?pbsv 513?pbsvx 516?pbtf2 1545?pbtrf 309?pbtrs 339?pocon 371?poequ 463?porfs 406?posv 498?posvx 500?potf2 1547?potrf 304?potri 442?potrs 334?ppcon 373?ppequ 465?pprfs 409?ppsv 505?ppsvx 508?pptrf 306?pptri 444?pptrs 336?ptcon 378?pteqr 743?ptrfs 415?ptsv 521?ptsvx 523?pttrf 311?pttrs 342?pttrsv 2108?ptts2 1548?rot 61, 1189?rotg 63?roti 188?rotm 64?rotmg 67?rscl 1550?sbev 976?sbevd 981?sbevx 990?sbgst 766?sbgv 1111

?sbgvd 1117?sbgvx 1126?sbmv 108?sbtrd 715?scal 69?sctr 190?sdot 56?spcon 384?spev 954?spevd 959?spevx 968?spgst 762?spgv 1086?spgvd 1092?spgvx 1101?spmv 111, 1190?spr 114, 1192?spr2 116?sprfs 424?spsv 543?spsvx 545?sptrd 702?sptrf 320?sptri 450?sptrs 349?stebz 747?stedc 731?stegr 737?stein 751?stemr 726?steqr 722?steqr2 2110?sterf 720?stev 1000?stevd 1002?stevr 1010?stevx 1006?sum1 1198?swap 71?sycon 380?syev 918?syevd 924?syevr 942?syevx 932?sygs2/?hegs2 1551?sygst 757?sygv 1058?sygvd 1065?sygvx 1074

3021

Index

?symm 159?symv 118, 1194?syr 121, 1196?syr2 123?syr2k 166?syrdb 683?syrfs 418?syrk 162?sysv 527?sysvx 530?sytd2/?hetd2 1553?sytf2 1555?sytrd 680?sytrf 313?sytri 446?sytrs 344?tbcon 394?tbmv 125?tbsv 129?tbtrs 360?tgevc 852?tgex2 1559?tgexc 857?tgsen 861?tgsja 884?tgsna 873?tgsy2 1562?tgsyl 868?tpcon 391?tpmv 133?tprfs 433?tpsv 136?tptri 456?tptrs 357?trcon 389?trevc 812?trexc 822?trmm 170?trmv 138?trrfs 430?trsen 825?trsm 173?trsna 817?trsv 141?trsyl 831?trti2 1566?trtri 2547?trtri (LAPACK) 454?trtrs 2529

?trtrs (LAPACK) 354?tzrzf 626?ungbr 661?unghr 788?unglq 594?ungql 604?ungqr 579?ungrq 618?ungtr 697?unmbr 664?unmhr 791?unmlq 596?unmql 610?unmqr 582?unmrq 623?unmrz 632?unmtr 699?upgtr 711?upmtr 713

1-norm valuecomplex Hermitian matrix

packed storage 1320complex Hermitian tridiagonal matrix 1322complex symmetric matrix 1323general rectangular matrix 1310, 1956general tridiagonal matrix 1311Hermitian band matrix 1316real symmetric matrix 1323, 1961real symmetric tridiagonal matrix 1322symmetric band matrix 1314symmetric matrix

packed storage 1318trapezoidal matrix 1331triangular band matrix 1327triangular matrix

packed storage 1329upper Hessenberg matrix 1313, 1958

Aabsolute value of a vector element

largest 72smallest 73

accuracy modes, in VML 2201adding magnitudes of the vector elements 50arguments

matrix 2777

3022


arguments (continued)sparse vector 177vector 2775

array descriptor 1589auxiliary routines

LAPACK 1169ScaLAPACK 1895

Bbalancing a matrix 794band storage scheme 2777basic quasi-number generator

Niederreiter 2290Sobol 2290

basic random number generators 2281, 2289, 2290GFSR 2289MCG, 32-bit 2289MCG, 59-bit 2289Mersenne Twister

MT19937 2290MT2203 2290

MRG 2289Wichmann-Hill 2289

Bernoulli 2369Beta 2362bidiagonal matrix


Binomial 2373bisection 1398BLACS 1589, 2719, 2720, 2721, 2723, 2724, 2725,

2727, 2728, 2729, 2730, 2731, 2732, 2734destruction routines 2727informational routines 2730initialization routines 2719miscellaneous routines 2732blacs_abort 2728blacs_barrier 2732blacs_exit 2729blacs_freebuff 2727blacs_get 2721blacs_gridexit 2728blacs_gridinfo 2730blacs_gridinit 2724blacs_gridmap 2725blacs_pcoord 2731blacs_pinfo 2720

BLACS (continued)blacs_pnum 2731blacs_set 2723blacs_setup 2720usage examples 2734

blacs_abort 2728blacs_barrier 2732blacs_exit 2729blacs_freebuff 2727blacs_get 2721blacs_gridexit 2728blacs_gridinfo 2730blacs_gridinit 2724blacs_gridmap 2725blacs_pcoord 2731blacs_pinfo 2720blacs_pnum 2731blacs_set 2723blacs_setup 2720BLAS Code Examples 2783BLAS Level 1 functions

?asum 49, 50?dot 49, 54?dotc 49, 57?dotu 49, 59?nrm2 49, 60?sdot 49, 56code example 2783dcabs1 49, 75i?amax 49, 72i?amin 49, 73

BLAS Level 1 routines?axpy 49, 51?copy 49, 53?rot 49, 61?rotg 49, 63?rotm 49, 64?rotmg 67?rotmq 49?scal 49, 69?swap 49, 71code example 2784

BLAS Level 2 routines?gbmv 75, 77?gemv 75, 81?ger 75, 84?gerc 75, 86?geru 75, 88?hbmv 75, 90

3023

Index

BLAS Level 2 routines (continued)?hemv 75, 93?her 75, 96?her2 75, 98?hpmv 75, 101?hpr 75, 103?hpr2 75, 105?sbmv 75, 108?spmv 75, 111?spr 75, 114?spr2 75, 116?symv 75, 118?syr 75, 121?syr2 75, 123?tbmv 75, 125?tbsv 75, 129?tpmv 75, 133?tpsv 75, 136?trmv 75, 138?trsv 75, 141code example 2785

BLAS Level 3 routines?gemm 143, 145?hemm 143, 149?her2k 143, 155?herk 143, 152?symm 143, 159?syr2k 143, 166?syrk 143, 162?trmm 143, 170?trsm 143, 173code example 2787

BLAS routinesroutine groups 45

block reflectorgeneral matrix


general rectangular matrixLAPACK 1385ScaLAPACK 1983

triangular factorLAPACK 1389, 1431ScaLAPACK 1994, 2011

block-cyclic distribution 1589block-splitting method 2290BRNG 2281, 2289

Bunch-Kaufman factorization 297, 313, 316, 320, 322, 1593

Hermitian matrixpacked storage 322

symmetric matrixpacked storage 320

CCauchy 2346CBLAS

arguments 2963level 1 (vector operations) 2964level 2 (matrix-vector operations) 2968level 3 (matrix-matrix operations) 2977sparse BLAS 2981

CBLAS to the BLAS 2963Chlosky factorization

Hermitian positive-definite matrix 1831symmetric positive-definite matrix 1831

Cholesky factorizationHermitian positive-definite matrix

band storage 309, 339, 516, 1603, 1618packed storage 306, 508

split 772symmetric positive-definite matrix

band storage 309, 339, 516, 1603, 1618packed storage 306, 508

clag2z 1568code examples

BLAS Level 1 function 2783BLAS Level 1 routine 2784BLAS Level 2 routine 2785BLAS Level 3 routine 2787

CommitDescriptor 2454CommitDescriptorDM 2508communication subprograms 1589complex division in real arithmetic 1227complex Hermitian matrix

1-norm valueLAPACK 1325ScaLAPACK 1961

factorization with diagonal pivoting method 1557Frobenius norm


infinity- normLAPACK 1325

3024


complex Hermitian matrix (continued)infinity- norm (continued)

ScaLAPACK 1961largest absolute value of element


complex Hermitian matrix in packed form1-norm value 1320Frobenius norm 1320infinity- norm 1320largest absolute value of element 1320

complex Hermitian tridiagonal matrix1-norm value 1322Frobenius norm 1322infinity- norm 1322largest absolute value of element 1322

complex matrixcomplex elementary reflector

ScaLAPACK 2007complex symmetric matrix

1-norm value 1323Frobenius norm 1323infinity- norm 1323largest absolute value of element 1323

complex vector1-norm using true absolute value


conjugationLAPACK 1184ScaLAPACK 1902

complex vector conjugationLAPACK 1184ScaLAPACK 1902

compressed sparse vectors 177computational node 2283Computational Routines 561ComputeBackward 2462ComputeBackwardDM 2515ComputeForward 2458ComputeForwardDM 2511condition number

band matrix 365general matrix

LAPACK 363ScaLAPACK 1633, 1636, 1639


condition number (continued)Hermitian positive-definite matrix

band storage 375packed storage 373tridiagonal 378


symmetric positive-definite matrixband storage 375packed storage 373tridiagonal 378

triangular matrixband storage 394packed storage 391

tridiagonal matrix 368configuration parameters, in DFTI 2447Continuous Distribution Generators 2208, 2324Continuous Distributions 2325ConvCopyTask 2435ConvDeleteTask 2433converting a sparse vector into compressed storage

form 185, 187and writing zeros to the original vector 187

converting compressed sparse vectors into full storageform 190ConvInternalPrecision 2417Convolution and Correlation 2394Convolution Functions

?ConvExec 2423?ConvExec1D 2425?ConvExecX 2428?ConvExecX1D 2431?ConvNewTask 2403?ConvNewTask1D 2406?ConvNewTaskX 2408?ConvNewTaskX1D 2411ConvCopyTask 2435ConvDeleteTask 2433ConvSetDecimation 2420ConvSetInternalPrecision 2417ConvSetMode 2415ConvSetStart 2418CorrCopyTask 2435CorrDeleteTask 2433

ConvSetMode 2415ConvSetStart 2418CopyDescriptor 2456

3025

Index

copyingmatrices

distributed 1943global parallel 1945local replicated 1945two-dimensional


vectors 53CopyStream 2308CopyStreamState 2309CorrCopyTask 2435CorrDeleteTask 2433Correlation Functions

?CorrExec 2423?CorrExec1D 2425?CorrExecX 2428?CorrExecX1D 2431?CorrNewTask 2403?CorrNewTask1D 2406?CorrNewTaskX 2408?CorrNewTaskX1D 2411CorrSetDecimation 2420CorrSetInternalPrecision 2417CorrSetMode 2415CorrSetStart 2418

CorrSetInternalDecimation 2420CorrSetInternalPrecision 2417CorrSetMode 2415CorrSetStart 2418Cray 2112CreateDescriptor 2452CreateDescriptorDM 2506

Ddata type

in VML 2201shorthand 42

Data Types 2292dcabs1 75dcg_check 2172dcg_get 2175dcg_init 2171dcgmrhs_check 2178dcgmrhs_get 2182dcgmrhs_init 2176DeleteStream 2307

descriptor configurationcluster DFTI 2506DFTI 2448

descriptor manipulationcluster DFTI 2506DFTI 2448

dfgmres_check 2184dfgmres_get 2188dfgmres_init 2183DFT computation 2448, 2506

cluster DFTI 2506DFT Interface 2447DFT routines

descriptor configurationGetValue 2468GetValueDM 2521SetValue 2465

descriptor manipulationCommitDescriptor 2454CommitDescriptorDM 2508CopyDescriptor 2456CreateDescriptor 2452CreateDescriptorDM 2506FreeDescriptor 2457FreeDescriptorDM 2510

DFT computationComputeBackward 2462ComputeBackwardDM 2515ComputeForward 2458ComputeForwardDM 2511

status checkingErrorClass 2449ErrorMessage 2451

diagonal elementsLAPACK 1475ScaLAPACK 2018

diagonally dominant-like banded matrixsolving systems of linear equations 1627

diagonally dominant-like tridiagonal matrixsolving systems of linear equations 1624

dimension 2775Direct Sparse Solver (DSS) Interface Routines 2136Discrete Distribution Generators 2325Discrete Distributions 2365Discrete Fourier Transform

CommitDescriptor 2454CommitDescriptorDM 2508ComputeBackward 2462ComputeBackwardDM 2515

3026


Discrete Fourier Transform (continued)ComputeForward 2458ComputeForwardDM 2511CopyDescriptor 2456CreateDescriptor 2452CreateDescriptorDM 2506ErrorClass 2449ErrorMessage 2451FreeDescriptor 2457FreeDescriptorDM 2510GetValue 2468GetValueDM 2521SetValue 2465SetValueDM 2518

distributed-memory computations 1589Distribution Generators 2323Distribution Generators Supporting Accurate Mode 2325djacobi 2691, 2700

usage example 2700djacobi_delete 2691djacobi_init 2689djacobi_solve 2690, 2693

usage example 2693dlag2s 1569dNewAbstractStream 2301dot product

complex vectors, conjugated 57complex vectors, unconjugated 59real vectors 54real vectors (extended precision) 56sparse complex vectors 184sparse complex vectors, conjugated 182sparse real vectors 181

driverexpert 1590simple 1590

Driver Routines 470, 891DSS interface, to sparse solver 2136dss_create 2139dtrnlsp

usage example 2652dtrnlsp_delete 2651dtrnlsp_get 2649dtrnlsp_init 2646dtrnlsp_solve 2647dtrnlspbc

usage example 2672dtrnlspbc_delete 2671dtrnlspbc_get 2670

dtrnlspbc_init 2667dtrnlspbc_solve 2668

Eeigenpairs, sorting 2096eigenvalue problems 557, 675, 756, 774, 834, 1775,

2098, 2110general matrix 774, 834, 1775generalized form 756Hermitian matrix 675symmetric matrix 675symmetric tridiagonal matrix 2098, 2110

eigenvalueseigenvalue problems 675

eigenvectorseigenvalue problems 675

elementary reflectorcomplex matrix 2007general matrix 1426, 1998general rectangular matrix

LAPACK 1383, 1392ScaLAPACK 1979, 1988

LAPACK generation 1387ScaLAPACK generation 1992

error diagnostics, in VML 2206error estimation for linear equations

distributed tridiagonal coefficient matrix 1651error handling

pxerbla 2117, 2711xerbla 45, 1588, 2206

ErrorClass 2449ErrorMessage 2451errors in solutions of linear equations

distributed tridiagonal coefficient matrix 1651general matrix

band storage 400Hermitian matrix

packed storage 427Hermitian positive-definite matrix

band storage 412packed storage 409


symmetric positive-definite matrixband storage 412packed storage 409

3027

Index

errors in solutions of linear equations (continued)triangular matrix


tridiagonal matrix 403Euclidean norm

of a vector 60expert driver 1590Exponential 2337

Ffactorization

Bunch-KaufmanLAPACK 297ScaLAPACK 1593

CholeskyLAPACK 297, 1545, 1547ScaLAPACK 2080

diagonal pivotingHermitian matrix

complex 1557packed 552

symmetric matrixindefinite 1555packed 545

LULAPACK 297ScaLAPACK 1593

orthogonalLAPACK 561ScaLAPACK 1667

partialcomplex Hermitian indefinite matrix 1498real/complex symmetric matrix 1496

triangular factorization 297triangular factorization[factorization

aaa] 1593upper trapezoidal matrix 1516

fill-in, for sparse matrices 2746finding

index of the element of a vector with the largestabsolute value of the real part 1903element of a vector with the largest absolute value72element of a vector with the largest absolute valueof the real part and its global index 1904

finding (continued)element of a vector with the smallest absolute value73

font conventions 42Fortran-95 interface conventions

BLAS, Sparse BLAS 47LAPACK 289

Fortran-95 Interfaces for LAPACKabsent from Netlib 2991indentical to Netlib 2984modified Netlib interfaces 2989new functionality 2997with replaced Netlib argument names 2987

Fortran-95 LAPACK interface vs. Netlib 291free_Helmholtz_2D 2607free_Helmholtz_3D 2607free_sph_np 2616free_sph_p 2616free_trig_transform 2570FreeDescriptor 2457FreeDescriptorDM 2510Frobenius norm

complex Hermitian matrixpacked storage 1320

complex Hermitian tridiagonal matrix 1322complex symmetric matrix 1323general rectangular matrix 1310, 1956general tridiagonal matrix 1311Hermitian band matrix 1316real symmetric matrix 1323, 1961real symmetric tridiagonal matrix 1322symmetric band matrix 1314symmetric matrix



full storage scheme 2777full-storage vectors 177function name conventions, in VML 2202

GGamma 2358

3028


gathering sparse vector's elements into compressedform 185, 187

and writing zeros to these elements 187Gauss method, for interval systems 2531, 2554Gauss-Seidel iteration, for interval systems 2537, 2554Gaussian 2329GaussianMV 2332general matrix

block reflector 1428, 2002eigenvalue problems 774, 834, 1775elementary reflector 1426, 1998estimating the condition number

band storage 365inverting matrix


LQ factorization 585, 1684LU factorization

band storage 300, 1199, 1596, 1599, 2101, 2103

matrix-vector productband storage 77

multiplying by orthogonal matrixfrom LQ factorization 1537, 2062from QR factorization 1535, 2057from RQ factorization 1540, 2066from RZ factorization 1542

multiplying by unitary matrixfrom LQ factorization 1537, 2062from QR factorization 1535, 2057from RQ factorization 1540, 2066from RZ factorization 1542

QL factorizationLAPACK 599ScaLAPACK 1698

QR factorizationwith pivoting 567, 570, 1670

rank-l update 84rank-l update, conjugated 86rank-l update, unconjugated 88reduction to bidiagonal form 1201, 1219, 1914reduction to upper Hessenberg form 1918RQ factorization


scalar-matrix-matrix product 145solving systems of linear equations

band storageLAPACK 328

general matrix (continued)solving systems of linear equations (continued)

band storage (continued)ScaLAPACK 1613

general rectangular distributed matrixcomputing scaling factors 1662equilibration 1662

general rectangular matrix1-norm value


block reflectorLAPACK 1385ScaLAPACK 1983

elementary reflectorLAPACK 1383, 1988ScaLAPACK 1979

Frobenius normLAPACK 1310ScaLAPACK 1956

infinity- normLAPACK 1310ScaLAPACK 1956

largest absolute value of elementLAPACK 1310ScaLAPACK 1956

LQ factorizationLAPACK 1205ScaLAPACK 1922

multiplicationLAPACK 1436ScaLAPACK 2016

QL factorizationLAPACK 1207ScaLAPACK 1924

QR factorizationLAPACK 1209ScaLAPACK 1927

reduction of first columnsLAPACK 1283, 1286ScaLAPACK 1951

reduction to bidiagonal form 1935row interchanges


RQ factorizationLAPACK 1210ScaLAPACK 1713, 1930

scaling 1971

3029

Index

general square matrixreduction to upper Hessenberg form 1203trace 2025

general triangular matrixLU factorization

band storage 1906general tridiagonal matrix


general tridiagonal triangular matrixLU factorization

band storage 1910generalized eigenvalue problems 756, 757, 759, 762,

764, 766, 769, 1551, 1553, 1805, 1808, 2083, 2086

complex Hermitian-definite problemband storage 769packed storage 764

real symmetric-definite problemband storage 766packed storage 762

See also LAPACK routines, generalized eigenvalueproblems 756

Generalized LLS Problems 909Generalized Nonsymmetric Eigenproblems 1136generalized Schur factorization 1277, 1381, 1393, 1395Generalized Singular Value Decomposition 878generalized Sylvester equation 868Generalized SymmetricDefinite Eigenproblems 1057generation methods 2283Geometric 2371GetBrngProperties 2389getcpuclocks 2714getcpufrequency 2714GetNumRegBrngs 2322GetStreamStateBrng 2321GetValue 2468GetValueDM 2521GFSR 2284Givens rotation

modified Givens transformation parameters 67of sparse vectors 188parameters 63

global array 1589Gumbel 2356

HHansen-Bliek-Rohn procedure, for interval systems 2539Helmholtz problem

three-dimensional 2585two-dimensional 2581

Helmholtz problem on a spherenon-periodic 2583periodic 2583

Hermitian band matrix1-norm value 1316Frobenius norm 1316infinity- norm 1316largest absolute value of element 1316

Hermitian matrix 90, 93, 96, 98, 101, 103, 105, 149, 152, 155, 316, 322, 346, 352, 382, 386, 448, 452, 675, 756, 1508, 1551, 1553, 1860, 1973, 2026, 2083, 2086

Bunch-Kaufman factorizationpacked storage 322

eigenvalues and eigenvectors 1860estimating the condition number

packed storage 386generalized eigenvalue problems 756inverting the matrix

packed storage 452matrix-vector product


rank-1 updatepacked storage 103


rank-2k update 155rank-n update 152reducing to standard form


reducing to tridiagonal formLAPACK 1508, 1553ScaLAPACK 2026, 2086

scalar-matrix-matrix product 149scaling 1973solving systems of linear equations

packed storage 352Hermitian positive definite distributed matrix

computing scaling factors 1664equilibration 1664

3030


Hermitian positive-definite band matrixCholesky factorization 1545

Hermitian positive-definite distributed matrixinverting the matrix 1658

Hermitian positive-definite matrixCholesky factorization

band storage 309, 1603packed storage 306

estimating the condition numberband storage 375packed storage 373

inverting the matrixpacked storage 444

solving systems of linear equationsband storage 339, 1618packed storage 336

Hermitian positive-definite tridiagonal matrixsolving systems of linear equations 1621

Householder matrixLAPACK 1387ScaLAPACK 1992

Householder method, for interval systems 2533, 2554Householder reflector 2094Hypergeometric 2376

Ii?amax 72i?amin 73i?max1 1197IBM ESSL library 2395IEEE arithmetic 1954IEEE standard

implementation 2113signbit position 2116

ilaenv 1573ilaver 1573ILU0 preconditioner 2190ILU0 Preconditioner Interface Description 2193Incomplete LU Factorization Technique 2190increment 2775iNewAbstractStream 2299infinity-norm

complex Hermitian matrixpacked storage 1320

complex Hermitian tridiagonal matrix 1322complex symmetric matrix 1323general rectangular matrix 1310, 1956

infinity-norm (continued)general tridiagonal matrix 1311Hermitian band matrix 1316real symmetric matrix 1323, 1961real symmetric tridiagonal matrix 1322symmetric band matrix 1314symmetric matrix



Interface Consideration 195interval solver routines

?gegas 2531?gegss 2537?gehbs 2539?gehss 2533?gekws 2535?gemip 2553?gepps 2540?gepss 2543?gerbr 2549?gesvr 2551?geszi 2548?trtri 2547?trtrs 2529

inverse matrix. inverting a matrix 439, 1655, 1658, 1660inverting a matrix

general matrixLAPACK 439ScaLAPACK 1655


Hermitian positive-definite matrixLAPACK 442packed storage 444ScaLAPACK 1658


symmetric positive-definite matrixLAPACK 442packed storage 444ScaLAPACK 1658

triangular distributed matrix 1660triangular matrix

packed storage 456

3031

Index

iparmq 1576Iterative Sparse Solvers 2151Iterative Sparse Solvers based on ReverseCommunication Interface (RCI ISS) 2151

JJacobi matrix calculation routines 2688, 2689, 2690,

2691djacobi 2691djacobi_delete 2691djacobi_init 2689djacobi_solve 2690

KKrawczyk iteration method, for interval systems 2535

LLAPACK

naming conventions 288LAPACK routines

2-by-2 generalized eigenvalue problem 12672-by-2 Hermitian matrix

plane rotation 13812-by-2 orthogonal matrices 12692-by-2 real matrix

generalized Schur factorization 12772-by-2 real nonsymmetric matrix

Schur factorization 13332-by-2 symmetric matrix

plane rotation 13812-by-2 triangular matrix

singular values 1434SVD 1491

approximation to smallest eigenvalue 1480, 1524auxiliary routines

?gbtf2 1199?gebd2 1201?gehd2 1203?gelq2 1205?geql2 1207?geqr2 1209?gerq2 1210?gesc2 1212?getc2 1213

LAPACK routines (continued)auxiliary routines (continued)

?getf2 1215?gtts2 1216?hetf2 1557?isnan 1218?labrd 1219?lacgv 1184?lacn2 1222?lacon 1224?lacpy 1225?lacrm 1185?lacrt 1186?ladiv 1227?lae2 1228?laebz 1229?laed0 1234?laed1 1237?laed2 1239?laed3 1242?laed4 1244?laed5 1246?laed6 1247?laed7 1249?laed8 1253?laed9 1256?laeda 1258?laein 1260?laesy 1187?laev2 1263?laexc 1265?lag2 1267?lags2 1269?lagtf 1271?lagtm 1273?lagts 1275?lagv2 1277?lahef 1498?lahqr 1280?lahr2 1286?lahrd 1283?laic1 1289?laisnan 1218?laln2 1291?lals0 1295?lalsa 1299?lalsd 1303?lamrg 1306?laneg 1307

3032



?langb 1308?lange 1310?langt 1311?lanhb 1316?lanhe 1325?lanhp 1320?lanhs 1313?lansb 1314?lansp 1318?lanst/?lanht 1322?lansy 1323?lantb 1327?lantp 1329?lantr 1331?lanv2 1333?lapll 1334?lapmt 1335?lapy2 1337?lapy3 1337?laqgb 1338?laqge 1340?laqhb 1342?laqp2 1344?laqps 1346?laqr0 1348?laqr1 1352?laqr2 1354?laqr3 1358?laqr4 1362?laqr5 1366?laqsb 1370?laqsp 1372?laqsy 1374?laqtr 1375?lar1v 1378?lar2v 1381?larf 1383?larfb 1385?larfg 1387?larft 1389?larfx 1392?largv 1393?larnv 1395?larra 1396?larrb 1398?larrc 1400?larrd 1402


?larre 1406?larrf 1410?larrj 1413?larrk 1415?larrr 1416?larrv 1418?lartg 1422?lartv 1424?laruv 1425?larz 1426?larzb 1428?larzt 1431?las2 1434?lascl 1436?lasd0 1437?lasd1 1439?lasd2 1443?lasd3 1447?lasd4 1450?lasd5 1452?lasd6 1453?lasd7 1458?lasd8 1462?lasd9 1464?lasda 1467?lasdq 1471?lasdt 1474?laset 1475?lasq1 1476?lasq2 1477?lasq3 1479?lasq4 1480?lasq5 1481?lasq6 1483?lasr 1484?lasrt 1488?lassq 1489?lasv2 1491?laswp 1492?lasy2 1494?lasyf 1496?latbs 1501?latdf 1503?latps 1506?latrd 1508?latrs 1512?latrz 1516

3033

Index


?lauu2 1519?lauum 1520?lazq3 1521?lazq4 1524?org2l/?ung2l 1526?org2r/?ung2r 1527?orgl2l/?ungl2 1529?orgr2/?ungr2 1531?orm2l/?unm2l 1532?orm2r/?unm2r 1535?orml2/?unml2 1537?ormr2/?unmr2 1540?ormr3/?unmr3 1542?pbtf2 1545?potf2 1547?ptts2 1548?rot 1189?rscl 1550?spmv 1190?spr 1192?sum1 1198?sygs2/?hegs2 1551?symv 1194?syr 1196?sytd2/?hetd2 1553?sytf2 1555?tgex2 1559?tgsy2 1562?trti2 1566clag2z 1568dlag2s 1569i?max1 1197slag2d 1570zlag2c 1571

bidiagonal divide and conquer 1474block reflector

triangular factor 1389, 1431checking for characters equality 1579checking for safe infinity 1578checking for strings equality 1580complex Hermitian matrix

packed storage 1320complex Hermitian tridiagonal matrix 1322complex matrix multiplication 1185complex symmetric matrix

computing eigenvalues and eigenvectors 1187matrix-vector product 1194

LAPACK routines (continued)complex symmetric matrix (continued)

symmetric rank-1 update 1196complex symmetric packed matrix

symmetric rank-1 update 1192complex vector

1-norm using true absolute value 1198index of element with max absolute value 1197linear transformation 1186matrix-vector product 1190plane rotation 1189

complex vector conjugation 1184condition number estimation

?disna 754?gbcon 365?gecon 363?gtcon 368?hecon 382?hpcon 386?pbcon 375?pocon 371?ppcon 373?ptcon 378?spcon 384?sycon 380?tbcon 394?tpcon 391?trcon 389

determining machine parameters 1583, 1584dqd transform 1483dqds transform 1481driver routines

generalized LLS problems?ggglm 914?gglse 910

generalized nonsymmetric eigenproblems?gges 1137?ggesx 1144?ggev 1153?ggevx 1159

generalized symmetric definite eigenproblems?hbgv 1114?hbgvd 1121?hbgvx 1131?hegv 1062?hegvd 1069?hegvx 1080?hpgv 1089?hpgvd 1096

3034


LAPACK routines (continued)driver routines (continued)

generalized symmetric definite eigenproblems(continued)

?hpgvx 1106?sbgv 1111?sbgvd 1117?sbgvx 1126?spgv 1086?spgvd 1092?spgvx 1101?sygv 1058?sygvd 1065?sygvx 1074

linear least squares problems?gels 892?gelsd 905?gelss 901?gelsy 896?lals0 (auxiliary) 1295?lalsa (auxiliary) 1299?lalsd (auxiliary) 1303

nonsymmetric eigenproblems?gees 1016?geesx 1022?geev 1028?geevx 1033

singular value decomposition?gelsd 905?gesdd 1046?gesvd 1041?ggsvd 1051

solving linear equations?gbsv 481?gbsvx 484?gesv 471?gesvx 475?gtsv 491?gtsvx 493?hesv 535?hesvx 538?hpsv 550?hpsvx 552?pbsv 513?pbsvx 516?posv 498?posvx 500?ppsv 505?ppsvx 508

LAPACK routines (continued)driver routines (continued)

solving linear equations (continued)?ptsv 521?ptsvx 523?spsv 543?spsvx 545?sysv 527?sysvx 530

symmetric eigenproblems?hbev 979?hbevd 986?hbevx 995?heev 921?heevd 928?heevr 948?heevx 937?hpev 957?hpevd 963?hpevx 972?sbev 976?sbevd 981?sbevx 990?spev 954?spevd 959?spevx 968?stev 1000?stevd 1002?stevr 1010?stevx 1006?syev 918?syevd 924?syevr 942?syevx 932

environmental enquiry 1573, 1576finding a relatively isolated eigenvalue 1410general band matrix

equilibration 1338general matrix

block reflector 1428elementary reflector 1426reduction to bidiagonal form 1201, 1219

general rectangular matrixblock reflector 1385elementary reflector 1383, 1392equilibration 1340LQ factorization 1205plane rotation 1484QL factorization 1207

3035

Index

LAPACK routines (continued)general rectangular matrix (continued)

QR factorization 1209row interchanges 1492RQ factorization 1210

general square matrixreduction to upper Hessenberg form 1203

general tridiagonal matrix 1271, 1273, 1275, 1311, 1406, 1418generalized eigenvalue problems

?hbgst 769?hegst 759?hpgst 764?pbstf 772?sbgst 766?spgst 762?sygst 757

generalized SVD?ggsvp 879?tgsja 884

generalized Sylvester equation?tgsyl 868

Hermitian band matrixequilibration 1342, 1374

Hermitian band matrix in packed storageequilibration 1372

Hermitian matrixcomputing eigenvalues and eigenvectors 1263

Householder matrixelementary reflector 1387

incremental condition estimation 1289linear dependence of vectors 1334LQ factorization

?gelq2 1205?gelqf 585?orglq 588?ormlq 591?unglq 594?unmlq 596

LU factorizationgeneral band matrix 1199

matrix equilibration?gbequ 460?geequ 458?laqgb 1338?laqge 1340?laqhb 1342?laqsb 1370?laqsp 1372

LAPACK routines (continued)matrix equilibration (continued)

?laqsy 1374?pbequ 467?poequ 463?ppequ 465

matrix inversion?getri 439?hetri 448?hptri 452?potri 442?pptri 444?sptri 450?sytri 446?tptri 456?trtri 454

matrix-matrix product?lagtm 1273

merging sets of singular values 1443, 1458mixed precision iterative refinement subroutines471, 1568, 1569, 1570, 1571nonsymmetric eigenvalue problems

?gebak 798?gebal 794?gehrd 779?hsein 806?hseqr 800?orghr 781?ormhr 784?trevc 812?trexc 822?trsen 825?trsna 817?unghr 788?unmhr 791

off-diagonal and diagonal elements 1475permutation list creation 1306permutation of matrix columns 1335plane rotation 1422, 1424, 1484plane rotation vector 1393QL factorization

?geql2 1207?geqlf 599?orgql 602?ormql 607?ungql 604?unmql 610

QR factorization?geqp3 570

3036


LAPACK routines (continued)QR factorization (continued)

?geqpf 567?geqr2 1209?geqrf 563?ggqrf 635?ggrqf 639?laqp2 1344?laqps 1346?orgqr 573?ormqr 576?ungqr 579?unmqr 582p?geqrf 1667

random numbers vector 1395real lower bidiagonal matrix

SVD 1471real square bidiagonal matrix

singular values 1476real symmetric matrix 1323real symmetric tridiagonal matrix 1229, 1322real upper bidiagonal matrix

singular values 1437SVD 1439, 1467, 1471

real upper quasi-triangular matrixorthogonal similarity transformation 1265

reciprocal condition numbers for eigenvaluesand/or eigenvectors

?tgsna 873RQ factorization

?geqr2 1210?gerqf 613?orgrq 616?ormrq 620?ungrq 618?unmrq 623

RZ factorization?ormrz 629?tzrzf 626?unmrz 632

singular value decomposition?bdsdc 672?bdsqr 668?gbbrd 650?gebrd 646?orgbr 653?ormbr 657?ungbr 661?unmbr 664

LAPACK routines (continued)solution refinement and error estimation

?gbrfs 400?gerfs 397?gtrfs 403?herfs 421?hprfs 427?pbrfs 412?porfs 406?pprfs 409?ptrfs 415?sprfs 424?syrfs 418?tbrfs 436?tprfs 433?trrfs 430

solving linear equations?gbtrs 328?getrs 325?gttrs 331?hetrs 346?hptrs 352?laln2 1291?laqtr 1375?pbtrs 339?potrs 334?pptrs 336?pttrs 342?sptrs 349?sytrs 344?tbtrs 360?tptrs 357?trtrs 354

sorting numbers 1488square root 1337square roots 1447, 1450, 1452, 1462, 1464, 1581Sylvester equation

?lasy2 1494?tgsy2 1562?trsyl 831

symmetric band matrixequilibration 1370, 1374

symmetric band matrix in packed storageequilibration 1372

symmetric eigenvalue problems?disna 754?hbtrd 718?herdb 686?hetrd 694

3037

Index

LAPACK routines (continued)symmetric eigenvalue problems (continued)

?hptrd 709?opgtr 704?opmtr 706?orgtr 689?ormtr 691?pteqr 743?sbtrd 715?sptrd 702?stebz 747?stedc 731?stegr 737?stein 751?stemr 726?steqr 722?sterf 720?syrdb 683?sytrd 680?ungtr 697?unmtr 699?upgtr 711?upmtr 713auxiliary

?lae2 1228?laebz 1229?laed0 1234?laed1 1237?laed2 1239?laed3 1242?laed4 1244?laed5 1246?laed6 1247?laed7 1249?laed8 1253?laed9 1256?laeda 1258

symmetric matrixcomputing eigenvalues and eigenvectors 1263packed storage 1318

symmetric positive-definite tridiagonal matrixeigenvalues 1477

trapezoidal matrix 1331, 1516triangular factorization

?gbtrf 300?getrf 297?gttrf 302?hetrf 316?hptrf 322

LAPACK routines (continued)triangular factorization (continued)

?pbtrf 309?potrf 304?pptrf 306?pttrf 311?sptrf 320?sytrf 313p?dbtrf 1599

triangular matrixpacked storage 1329

triangular system of equations 1506, 1512tridiagonal band matrix 1327uniform distribution 1425unreduced symmetric tridiagonal matrix 1234updated upper bidiagonal matrix

SVD 1453updating sum of squares 1489upper Hessenberg matrix

computing a specified eigenvector 1260eigenvalues 1280Schur factorization 1280

utility functions and routines?labad 1581?lamc1 1583?lamc2 1584?lamc3 1585?lamc4 1585?lamc5 1586?lamch 1582ieeeck 1578ilaenv 1573ilaver 1573iparmq 1576lsame 1579lsamen 1580second/dsecnd 1587xerbla 1588

Laplace 2340Laplace problem

three-dimensional 2586two-dimensional 2582

largest absolute value of elementcomplex Hermitian matrix

packed storage 1320complex Hermitian tridiagonal matrix 1322complex symmetric matrix 1323general rectangular matrix 1310, 1956general tridiagonal matrix 1311

3038


largest absolute value of element (continued)Hermitian band matrix 1316real symmetric matrix 1323, 1961real symmetric tridiagonal matrix 1322symmetric band matrix 1314symmetric matrix



leading dimension 2780leapfrog method 2290LeapfrogStream 2313least-squares problems 557length. dimension 2775library verion 2706Library Version Obtaining 2706library version string 2709linear combination of vectors 51Linear Congruential Generator 2284linear equations, solving 325, 328, 331, 334, 336,

339, 342, 344, 346, 349, 352, 354, 357, 360, 471, 475, 481, 484, 491, 493, 498, 500, 505, 508, 513, 516, 521, 523, 527, 530, 535, 538, 543, 545, 550, 552, 1291, 1375, 1378, 1611, 1613, 1616, 1618, 1621, 1624, 1627, 1630, 1811, 1813, 1820, 1823, 1826, 1829, 1831, 1838, 1841, 1844, 2071, 2106, 2108

tridiagonal symmetric positive-definite matrixLAPACK 521ScaLAPACK 1841

band matrixLAPACK 481, 484ScaLAPACK 1820

Cholesky-factored matrixLAPACK 339ScaLAPACK 1618

diagonally dominant-like matrixbanded 1627tridiagonal 1624

general band matrixScaLAPACK 1823

general matrixband storage 328, 1613

general tridiagonal matrixScaLAPACK 1826

linear equations, solving (continued)Hermitian matrix

error bounds 538, 552packed storage 352, 550, 552

Hermitian positive-definite matrixband storage


error boundsLAPACK 500ScaLAPACK 1831

LAPACKlinear equations, solving

multiple right-hand sidessymmetric positive-definitematrix 498

packed storage 336, 505, 508ScaLAPACK 1831

Hermitian positive-definite tridiagonal linearequations 2108Hermitian positive-definite tridiagonal matrix 1621multiple right-hand sides

band matrixLAPACK 481, 484ScaLAPACK 1820

Hermitian matrix 535, 550Hermitian positive-definite matrix

band storage 513square matrix


symmetric matrix 527, 543symmetric positive-definite matrix

band storage 513tridiagonal matrix 491, 493

overestimated or underestimated system 1844square matrix

error boundsLAPACK 475, 484ScaLAPACK 1813


symmetric matrixerror bounds 530, 545packed storage 349, 543, 545

symmetric positive-definite matrixband storage


3039

Index

linear equations, solving (continued)symmetric positive-definite matrix (continued)

error boundsLAPACK 500ScaLAPACK 1831

LAPACK 498, 500packed storage 336, 505, 508ScaLAPACK 1829, 1831

symmetric positive-definite tridiagonal linearequations 2108triangular matrix

band storage 360, 2071packed storage 357

tridiagonal Hermitian positive-definite matrixerror bounds 523LAPACK 521ScaLAPACK 1841

tridiagonal matrixerror bounds 493LAPACK 331, 342, 491, 493LAPACK auxiliary 1378ScaLAPACK auxiliary 2106

tridiagonal symmetric positive-definite matrixerror bounds 523

Linear Least Squares (LLS) Problems 892LoadStreamF 2312Lognormal 2352LQ factorization 561, 588, 594, 1205, 1687, 1689,

1922computing the elements of

orthogonal matrix Q 588real orthogonal matrix Q 1687unitary matrix Q 594, 1689

general rectangular matrix 1205, 1922lsame 1579, 2712lsamen 1580, 2712LU factorization 297, 300, 302, 1199, 1212, 1213,

1215, 1216, 1271, 1275, 1503, 1594, 1596, 1599, 1608, 1813, 1906, 1910, 1933, 2101, 2103, 2105

band matrixblocked algorithm 2103unblocked algorithm 2101

diagonally dominant-like tridiagonal matrix 1608general band matrix 1199general matrix 1215, 1933solving linear equations

general matrix 1212square matrix 1813

LU factorization (continued)solving linear equations (continued)

tridiagonal matrix 1216, 1275triangular band matrix 1906tridiagonal band matrix 1910tridiagonal matrix 302, 1271, 2105with complete pivoting 1213, 1503with partial pivoting 1215, 1933

Mmachine parameters


matrix arguments 2775, 2777, 2780column-major ordering 2775, 2780example 2780leading dimension 2780number of columns 2780number of rows 2780transposition parameter 2780

matrix blockQR factorization

with pivoting 1344matrix equation

AX = B 173, 294, 325, 1590, 1611matrix one-dimensional substructures 2775matrix-matrix operation

productgeneral matrix 145

rank-2k updateHermitian matrix 155symmetric matrix 166

rank-n updateHermitian matrix 152symmetric matrix 162

scalar-matrix-matrix productHermitian matrix 149symmetric matrix 159

matrix-matrix operation:scalar-matrix-matrix producttriangular matrix 170

matrix-vector operationproduct

Hermitian matrix 90, 93, 101real symmetric matrix 111, 118triangular matrix 125, 133, 138

rank-1 updateHermitian matrix 96, 103

3040


matrix-vector operation (continued)rank-1 update (continued)

real symmetric matrix 114, 121rank-2 update

Hermitian matrix 98, 105symmetric matrix 116, 123

matrix-vector operation:productHermitian matrix


real symmetric matrixpacked storage 111

symmetric matrixband storage 108


matrix-vector operation:rank-1 updateHermitian matrix

packed storage 103real symmetric matrix

packed storage 114matrix-vector operation:rank-2 update



mkl_cspblas_dbsrsymv 236mkl_cspblas_dcsrsymv 210mkl_dbsrmv 231mkl_dbsrsymv 234mkl_dcoogemv 217mkl_dcoomm 264mkl_dcoomv 215mkl_dcoosm 277mkl_dcoosv 246mkl_dcoosymv 219mkl_dcootrsv 248mkl_dcscmm 261mkl_dcscmv 212mkl_dcscsm 275mkl_dcscsv 243mkl_dcsrgemv 206mkl_dcsrmm 258mkl_dcsrmv 203mkl_dcsrsm 272mkl_dcsrsv 238mkl_dcsrsymv 208mkl_dcsrtrsv 241

mkl_ddiagemv 224mkl_ddiamm 266mkl_ddiamv 222mkl_ddiasm 280mkl_ddiasv 251mkl_ddiasymv 226mkl_ddiatrsv 253mkl_dskymm 269mkl_dskymv 228mkl_dskysm 282mkl_dskysv 256MKL_FreeBuffers 2715MKLGetVersion 2706MKLGetVersionStirng 2709MPI 1589Multiplicative Congruential Generator 2284

Nnaming conventions 42, 45, 177, 191, 558, 1590,

2202BLAS 45LAPACK 558, 1590Sparse BLAS Level 1 177Sparse BLAS Level 2 191Sparse BLAS Level 3 191VML 2202

negative eigenvalues 1954NegBinomial 2383NewStream 2296NewStreamEx 2297NewTaskX1D 2411nonlinear least square problem 2999Nonsymmetric Eigenproblems 1015

Ooff-diagonal elements

initialization 2018LAPACK 1475ScaLAPACK 2018

one-dimensional FFTsstorage effects 2485, 2487

optimization solvers basics 2999

3041

Index

orthogonal matrix 643, 675, 774, 834, 1526, 1527, 1529, 1531, 1532, 1775, 1789, 2041, 2044, 2047, 2050, 2053

from LQ factorizationLAPACK 1529ScaLAPACK 2047

from QL factorizationLAPACK 1526, 1532ScaLAPACK 2041, 2053

from QR factorizationLAPACK 1527ScaLAPACK 2044

from RQ factorizationLAPACK 1531ScaLAPACK 2050

Pp?dbsv 1823p?dbtrf 1599p?dbtrs 1627p?dbtrsv 1906p?dtsv 1826p?dttrf 1608p?dttrs 1624p?dttrsv 1910p?gbsv 1820p?gbtrf 1596p?gbtrs 1613p?gebd2 1914p?gebrd 1790p?gecon 1633p?geequ 1662p?gehd2 1918p?gehrd 1775p?gelq2 1922p?gelqf 1684p?gels 1844p?geql2 1924p?geqlf 1698p?geqpf 1670p?geqr2 1927p?geqrf 1667p?gerfs 1643p?gerq2 1930p?gerqf 1713p?gesv 1811p?gesvd 1869

p?gesvx 1813p?getf2 1933p?getrf 1594p?getri 1655p?getrs 1611p?ggqrf 1738p?ggrqf 1743p?heevx 1860p?hegst 1808p?hegvx 1883p?hetrd 1757p?labad 2112p?labrd 1935p?lachkieee 2113p?lacon 1940p?laconsb 1942p?lacp2 1943p?lacp3 1945p?lacpy 1947p?laevswp 1949p?lahqr 1787p?lahrd 1951p?laiect 1954p?lamch 2114p?lange 1956p?lanhs 1958p?lantr 1964p?lapiv 1967p?laqge 1971p?laqsy 1973p?lared1d 1976p?lared2d 1977p?larf 1979p?larfb 1983p?larfc 1988p?larfg 1992p?larft 1994p?larz 1998p?larzb 2002p?larzt 2011p?lascl 2016p?laset 2018p?lasmsub 2020p?lasnbt 2116p?lassq 2021p?laswp 2023p?latra 2025p?latrd 2026p?latrz 2034

3042


p?lauu2 2037p?lauum 2039p?lawil 2040p?max1 1903p?org2l/p?ung2l 2041p?org2r/p?ung2r 2044p?orgl2/p?ungl2 2047p?orglq 1687p?orgql 1701p?orgqr 1673p?orgr2/p?ungr2 2050p?orgrq 1716p?orm2l/p?unm2l 2053p?orm2r/p?unm2r 2057p?ormbr 1795p?ormhr 1779p?orml2/p?unml2 2062p?ormlq 1691p?ormql 1706p?ormqr 1677p?ormr2/p?unmr2 2066p?ormrq 1720p?ormrz 1731p?ormtr 1753p?pbsv 1838p?pbtrf 1603p?pbtrs 1618p?pbtrsv 2071p?pocon 1636p?poequ 1664p?porfs 1647p?posv 1829p?posvx 1831p?potf2 2080p?potrf 1601p?potri 1658p?potrs 1616p?ptsv 1841p?pttrf 1606p?pttrs 1621p?pttrsv 2076p?rscl 2082p?stebz 1766p?stein 1770p?sum1 1905p?syev 1849p?syevx 1852p?sygs2/p?hegs2 2083p?sygst 1805

p?sygvx 1874p?sytd2/p?hetd2 2086p?sytrd 1749p?trcon 1639p?trrfs 1651p?trti2 2090p?trtri 1660p?trtrs 1630p?tzrzf 1727p?unglq 1689p?ungql 1703p?ungqr 1675p?ungrq 1718p?unmbr 1800p?unmhr 1783p?unmlq 1695p?unmql 1709p?unmqr 1681p?unmrq 1724p?unmrz 1734p?unmtr 1762Packed formats 2478packed storage scheme 2777parallel direct solver (Pardiso) 2119parameter partitioning, for interval systems 2540parameters

for a Givens rotation 63modified Givens transformation 67

PARDISO 2119pardiso function 2120Partial Differential Equations support 2557, 2581,

2582, 2583, 2585, 2586Helmholtz problem on a sphere 2582Poisson problem on a sphere 2583three-dimensional Helmholtz problem 2585three-dimensional Laplace problem 2586three-dimensional Poisson problem 2586two-dimensional Helmholtz problem 2581two-dimensional Laplace problem 2582two-dimensional Poisson problem 2582

PDE support 2557PDE Support Code Examples 2905pdlaiectb 1954pdlaiectl 1954permutation matrix 2745pivoting matrix rows or columns 1967PL Interface 2579platforms supported 39

3043

Index

points rotationin the modified plane 64in the plane 61

Poisson 2378Poisson Library 2579, 2580, 2588, 2592, 2595, 2601,

2607, 2608, 2611, 2613, 2616, 2926routines

?_commit_Helmholtz_2D 2595?_commit_Helmholtz_3D 2595?_commit_sph_np 2611?_commit_sph_p 2611?_Helmholtz_2D 2601?_Helmholtz_3D 2601?_init_Helmholtz_2D 2592?_init_Helmholtz_3D 2592?_init_sph_np 2608?_init_sph_p 2608?_sph_np 2613?_sph_p 2613code examples 2926free_Helmholtz_2D 2607free_Helmholtz_3D 2607free_sph_np 2616free_sph_p 2616

structure 2580Poisson problem

on a sphere 2583three-dimensional 2586two-dimensional 2582

PoissonV 2381preconditioners based on incomplete LU factorization

2190, 2194dcsrilu0 2194

preconditioning, of an interval system 2554process grid 1589product

matrix-vectorgeneral matrix 77, 81Hermitian matrix 90, 93, 101real symmetric matrix 111, 118triangular matrix 125, 133, 138

scalar-matrixgeneral matrix 145Hermitian matrix 149

scalar-matrix-matrixgeneral matrix 145Hermitian matrix 149symmetric matrix 159triangular matrix 170

product (continued)vector-scalar 69

product:matrix-vectorgeneral matrix

band storage 77Hermitian matrix


real symmetric matrixpacked storage 111

symmetric matrixband storage 108


pseudorandom numbers 2281pslaiect 1954pxerbla 2117, 2711

QQL factorization

computing the elements ofcomplex matrix Q 604orthogonal matrix Q 1701real matrix Q 602unitary matrix Q 1703

general rectangular matrixLAPACK 1207ScaLAPACK 1924

multiplying general matrix byorthogonal matrix Q 1706unitary matrix Q 1709

QR factorization 561, 567, 570, 573, 579, 1209, 1210, 1344, 1346, 1670, 1673, 1675, 1927, 1930

computing the elements oforthogonal matrix Q 573, 1673unitary matrix Q 579, 1675

general rectangular matrixLAPACK 1209, 1210ScaLAPACK 1927, 1930

with pivotingScaLAPACK 1670

quasi-random numbers 2281quasi-triangular matrix

LAPACK 774, 834ScaLAPACK 1775

quasi-triangular system of equations 1375

3044


Rrandom number generators 2281random stream 2292Random Streams 2292rank-1 update

conjugated, general matrix 86general matrix 84Hermitian matrix


packed storage 114unconjugated, general matrix 88

rank-2 updateHermitian matrix

packed storage 105symmetric matrix

packed storage 116rank-2k update

Hermitian matrix 155symmetric matrix 166

rank-n updateHermitian matrix 152symmetric matrix 162

Rayleigh 2349RCI CG Interface 2156RCI CG sparse solver routines

dcg 2173, 2179dcg_check 2172dcg_get 2175dcg_init 2171dcgmrhs_check 2178dcgmrhs_get 2182dcgmrhs_init 2176

RCI FGMRES Interface 2162RCI FGMRES sparse solver routines

dfgmres_check 2184dfgmres_get 2188dfgmres_init 2183

RCI GFMRES sparse solver routinesdfgres 2185

RCI ISS 2151RCI ISS interface 2151RCI ISS sparse solver routines

implementation details 2189real matrix

QR factorizationwith pivoting 1346

real symmetric matrix1-norm value 1323Frobenius norm 1323infinity- norm 1323largest absolute value of element 1323

real symmetric tridiagonal matrix1-norm value 1322Frobenius norm 1322infinity- norm 1322largest absolute value of element 1322

reducing generalized eigenvalue problemsLAPACK 757ScaLAPACK 1805

reduction to upper Hessenberg formgeneral matrix 1918general square matrix 1203

refining solutions of linear equationsband matrix 400general matrix 397, 1643Hermitian matrix




symmetric positive-definite matrixband storage 412packed storage 409

symmetric/Hermitian positive-definite distributedmatrix 1647tridiagonal matrix 403

RegisterBrng 2388registering a basic generator 2385reordering of matrices 2746Reverse Communication Interface 2151Rex-Rohn test 2550, 2551Ris-Beeck spectral criterion 2550rotation

of points in the modified plane 64of points in the plane 61of sparse vectors 188parameters for a Givens rotation 63parameters of modified Givens transformation 67

routine group 40routine name conventions

BLAS 45Sparse BLAS Level 1 177Sparse BLAS Level 2 191

3045

Index

routine name conventions (continued)Sparse BLAS Level 3 191

RQ factorizationcomputing the elements of

complex matrix Q 618orthogonal matrix Q 1716real matrix Q 616unitary matrix Q 1718

Rump criterion 2551

SSaveStreamF 2310ScaLAPACK 1589ScaLAPACK routines

1D array redistribution 1976, 1977auxiliary routines

?combamax1 1904?dbtf2 2101?dbtrf 2103?dttrf 2105?dttrsv 2106?lamsh 2092?laref 2094?lasorte 2096?lasrt2 2097?pttrsv 2108?stein2 2098?steqr2 2110p?dbtrsv 1906p?dttrsv 1910p?gebd2 1914p?gehd2 1918p?gelq2 1922p?geql2 1924p?geqr2 1927p?gerq2 1930p?getf2 1933p?labrd 1935p?lacgv 1902p?lacon 1940p?laconsb 1942p?lacp2 1943p?lacp3 1945p?lacpy 1947p?laevswp 1949p?lahrd 1951p?laiect 1954

ScaLAPACK routines (continued)auxiliary routines (continued)

p?lange 1956p?lanhs 1958p?lansy, p?lanhe 1961p?lantr 1964p?lapiv 1967p?laqge 1971p?laqsy 1973p?lared1d 1976p?lared2d 1977p?larf 1979p?larfb 1983p?larfc 1988p?larfg 1992p?larft 1994p?larz 1998p?larzb 2002p?larzc 2007p?larzt 2011p?lascl 2016p?laset 2018p?lasmsub 2020p?lassq 2021p?laswp 2023p?latra 2025p?latrd 2026p?latrs 2031p?latrz 2034p?lauu2 2037p?lauum 2039p?lawil 2040p?max1 1903p?org2l/p?ung2l 2041p?org2r/p?ung2r 2044p?orgl2/p?ungl2 2047p?orgr2/p?ungr2 2050p?orm2l/p?unm2l 2053p?orm2r/p?unm2r 2057p?orml2/p?unml2 2062p?ormr2/p?unmr2 2066p?pbtrsv 2071p?potf2 2080p?pttrsv 2076p?rscl 2082p?sum1 1905p?sygs2/p?hegs2 2083p?sytd2/p?hetd2 2086p?trti2 2090

3046


ScaLAPACK routines (continued)auxiliary routines (continued)

pdlaiectb 1954pdlaiectl 1954pslaiect 1954

block reflectortriangular factor 1994, 2011

Cholesky factorization 1606complex matrix

complex elementary reflector 2007complex vector

1-norm using true absolute value 1905complex vector conjugation 1902condition number estimation

p?gecon 1633p?pocon 1636p?trcon 1639

driver routinesp?dbsv 1823p?dtsv 1826p?gbsv 1820p?gels 1844p?gesv 1811p?gesvd 1869p?gesvx 1813p?heevx 1860p?hegvx 1883p?pbsv 1838p?posv 1829p?posvx 1831p?ptsv 1841p?syev 1849p?syevx 1852p?sygvx 1874

error estimationp?trrfs 1651

error handlingpxerbla 2117, 2711

general matrixblock reflector 2002elementary reflector 1998LU factorization 1933reduction to upper Hessenberg form 1918

general rectangular matrixelementary reflector 1979LQ factorization 1922QL factorization 1924QR factorization 1927reduction to bidiagonal form 1935

ScaLAPACK routines (continued)general rectangular matrix (continued)

reduction to real bidiagonal form 1914row interchanges 2023RQ factorization 1930

generalized eigenvalue problemsp?hegst 1808p?sygst 1805

Householder matrixelementary reflector 1992

LQ factorizationp?gelq2 1922p?gelqf 1684p?orglq 1687p?ormlq 1691p?unglq 1689p?unmlq 1695

LU factorizationp?dbtrsv 1906p?dttrf 1608p?dttrsv 1910p?getf2 1933

matrix equilibrationp?geequ 1662p?poequ 1664

matrix inversionp?getri 1655p?potri 1658p?trtri 1660

nonsymmetric eigenvalue problemsp?gehrd 1775p?lahqr 1787p?ormhr 1779p?unmhr 1783

QL factorization?geqlf 1698?ungql 1703p?geql2 1924p?orgql 1701p?ormql 1706p?unmql 1709

QR factorizationp?geqpf 1670p?geqr2 1927p?ggqrf 1738p?orgqr 1673p?ormqr 1677p?ungqr 1675p?unmqr 1681

3047

Index

ScaLAPACK routines (continued)RQ factorization

p?gerq2 1930p?gerqf 1713p?ggrqf 1743p?orgrq 1716p?ormrq 1720p?ungrq 1718p?unmrq 1724

RZ factorizationp?ormrz 1731p?tzrzf 1727p?unmrz 1734

singular value decompositionp?gebrd 1790p?ormbr 1795p?unmbr 1800

solution refinement and error estimationp?gerfs 1643p?porfs 1647

solving linear equations?dttrsv 2106?pttrsv 2108p?dbtrs 1627p?dttrs 1624p?gbtrs 1613p?getrs 1611p?potrs 1616p?pttrs 1621p?trtrs 1630

symmetric eigenproblemsp?hetrd 1757p?ormtr 1753p?stebz 1766p?stein 1770p?sytrd 1749p?unmtr 1762

symmetric eigenvalue problems?stein2 2098?steqr2 2110

trapezoidal matrix 2034triangular factorization

?dbtrf 2103?dttrf 2105p?dbtrsv 1906p?dttrsv 1910p?gbtrf 1596p?getrf 1594p?pbtrf 1603

ScaLAPACK routines (continued)triangular factorization (continued)

p?potrf 1601p?pttrf 1606

triangular system of equations 2031updating sum of squares 2021utility functions and routines

p?labad 2112p?lachkieee 2113p?lamch 2114p?lasnbt 2116pxerbla 2117, 2711

scalar-matrix product 145, 149, 159scalar-matrix-matrix product 145, 149, 159, 170

general matrix 145symmetric matrix 159triangular matrix 170

scalinggeneral rectangular matrix 1971symmetric/Hermitian matrix 1973

scaling factorsgeneral rectangular distributed matrix 1662Hermitian positive definite distributed matrix 1664symmetric positive definite distributed matrix 1664

scattering compressed sparse vector's elements intofull storage form 190Schulz interval procedure 2548Schulz iterative method 2548Schur decomposition 857, 861Schur factorization 1277, 1280, 1333second/dsecnd 2713Service Functions 2204Service Routines 2294setcpufrequency 2715SetInternalDecimation 2420SetValue 2465SetValueDM 2518simple driver 1590single node matrix 2092singular value decomposition

LAPACK 643, 1041LAPACK routines, singular value

decomposition[singular valuedecomposition

aaa] 1789ScaLAPACK 1789, 1869See also LAPACK routines, singular valuedecomposition 643

Singular Value Decomposition 1040

3048


SkipAheadStream 2317slag2d 1570small subdiagonal element 2020smallest absolute value of a vector element 73sNewAbstractStream 2304solution partitioning 2543solver

direct 2743iterative 2743

SolverSparse 2119

solving linear equations 328solving linear equations. linear equations 1613solving linear equations. See linear equations 1291sorting

eigenpairs 2096numbers in increasing/decreasing order


Sparse BLAS Level 1 176, 177data types 177naming conventions 177

Sparse BLAS Level 1 routines and functions 177, 179, 181, 182, 184, 185, 187, 188, 190

?axpyi 179?dotci 182?doti 181?dotui 184?gthr 185?gthrz 187?roti 188?sctr 190

Sparse BLAS Level 2 191naming conventions 191

sparse BLAS Level 2 routinesmkl_cspblas_dbsrsymv 236mkl_cspblas_dcsrsymv 210mkl_dbsrgemvmkl_dbsrmv 231mkl_dbsrsymv 234mkl_dcoogemv 217mkl_dcoomv 215mkl_dcoosv 246mkl_dcoosymv 219mkl_dcootrsv 248mkl_dcscmv 212mkl_dcscsv 243mkl_dcsrgemv 206mkl_dcsrmv 203

sparse BLAS Level 2 routines (continued)mkl_dcsrsv 238mkl_dcsrsymv 208mkl_dcsrtrsv 241mkl_ddiagemv 224mkl_ddiamv 222mkl_ddiasv 251mkl_ddiasymv 226mkl_ddiatrsv 253mkl_dskymv 228mkl_dskysv 256

Sparse BLAS Level 3 191naming conventions 191

sparse BLAS Level 3 routinesmkl_dcoomm 264mkl_dcoosm 277mkl_dcscmm 261mkl_dcscsm 275mkl_dcsrmm 258mkl_dcsrsm 272mkl_ddiamm 266mkl_ddiasm 280mkl_dskymm 269mkl_dskysm 282

sparse matrices 191sparse matrix 191Sparse Matrix Data Structures 193Sparse Solver

direct sparse solver interfacedss_create 2139dss_define_structure

dss_define_structure 2140dss_delete 2145dss_factor_real, dss_factor_complex 2142dss_reorder 2141dss_solve_real, dss_solve_complex 2143dss_statistics 2145mkl_cvt_to_null_terminated_str 2149

iterative sparse solver interfacedcg 2173dcg_check 2172dcg_get 2175dcg_init 2171dcgmrhs 2179dcgmrhs_check 2178dcgmrhs_get 2182dcgmrhs_init 2176dfgmres 2185dfgmres_check 2184

3049

Index

Sparse Solver (continued)iterative sparse solver interface (continued)

dfgmres_get 2188dfgmres_init 2183

preconditioners based on incomplete LUfactorization

dcsrilu0 2194Sparse Solvers 2119sparse vectors 176, 177, 179, 181, 182, 184, 185,

187, 188, 190adding and scaling 179complex dot product, conjugated 182complex dot product, unconjugated 184compressed form 177converting to compressed form 185, 187converting to full-storage form 190full-storage form 177Givens rotation 188norm 177passed to BLAS level 1 routines 177real dot product 181scaling 177

Specific Features of Fortran-95 Interfaces for LAPACKRoutines 2983split Cholesky factorization (band matrices) 772square matrix

1-norm estimationLAPACK 1222, 1224ScaLAPACK 1940

status checkingDFTI 2448

storage, of sparse matrices 2752stream 2292stream descriptor 2283stride. increment 2775sum

of magnitudes of the vectoxr elements 50of sparse vector and full-storage vector 179of vectors 51

sum of squaresupdating


support routinesMKL_FreeBuffers 2715

SVD (singular value decomposition)LAPACK 643ScaLAPACK 1789

swapping adjacent diagonal blocks 1265, 1559

swapping vectors 71Sylvester's equation 831symmetric band matrix


Symmetric Eigenproblems 917symmetric indefinite matrix

factorization with diagonal pivoting method 1555symmetric matrix 108, 111, 114, 116, 118, 121, 123,

159, 162, 166, 313, 320, 344, 349, 380, 384, 446, 450, 675, 754, 756, 1190, 1192, 1194, 1196, 1508, 1551, 1553, 1849, 1852, 1973, 2026, 2083, 2086

Bunch-Kaufman factorizationpacked storage 320

eigenvalues and eigenvectors 1849, 1852estimating the condition number

packed storage 384generalized eigenvalue problems 756inverting the matrix

packed storage 450matrix-vector product

band storage 108packed storage 111, 1190

rank-1 updatepacked storage 114, 1192


rank-2k update 166rank-n update 162reducing to standard form


reducing to tridiagonal formLAPACK 1508ScaLAPACK 2026

scalar-matrix-matrix product 159scaling 1973solving systems of linear equations

packed storage 349symmetric matrix in packed form


symmetric positive definite distributed matrixcomputing scaling factors 1664

3050


symmetric positive definite distributed matrix(continued)

equilibration 1664symmetric positive-definite band matrix

Cholesky factorization 1545symmetric positive-definite distributed matrix

inverting the matrix 1658symmetric positive-definite matrix

Cholesky factorizationband storage 309, 1603LAPACK 304, 1547packed storage 306ScaLAPACK 1601, 2080

estimating the condition numberband storage 375packed storage 373tridiagonal matrix 378

inverting the matrixpacked storage 444

solving systems of linear equationsband storage 339, 1618LAPACK 334packed storage 336ScaLAPACK 1616

symmetric positive-definite tridiagonal matrixsolving systems of linear equations 1621

symmetrically structured systems 2754system of linear equations

with a triangular matrixband storage 129packed storage 136

systems of linear equations 325, 331, 342, 2106linear equations 2106

systems of linear equationslinear equations 1611

Ttiming functions

getcpuclocks 2714getcpufrequency 2714second/dsecnd 2713setcpufrequency 2715

TR routinesdtrnlsp_delete 2651dtrnlsp_get 2649dtrnlsp_init 2646dtrnlsp_solve 2647dtrnlspbc_delete 2671

TR routines (continued)dtrnlspbc_get 2670dtrnlspbc_init 2667dtrnlspbc_solve 2668nonlinear least-squares problem

with linear bound constraints 2666without constraints 2645

organization and implementation 2643transposition parameter 2780trapezoidal matrix

1-norm value 1331Frobenius norm 1331infinity- norm 1331largest absolute value of element 1331reduction to triangular form 2034RZ factorization


triangular band matrix1-norm value 1327Frobenius norm 1327infinity- norm 1327largest absolute value of element 1327

triangular banded equationsLAPACK 1501ScaLAPACK 2071

triangular distributed matrixinverting the matrix 1660

triangular factorizationband matrix 300, 1596, 1599, 1906, 2103general matrix 297, 1594Hermitian matrix


band storage 309, 1603packed storage 306tridiagonal matrix 311, 1606


symmetric positive-definite matrixband storage 309, 1603packed storage 306tridiagonal matrix 311, 1606

tridiagonal matrixLAPACK 302ScaLAPACK 2105

triangular matrix 125, 129, 133, 136, 138, 141, 170, 354, 357, 360, 389, 391, 394, 454, 456,

3051

Index

triangular matrix (continued)774, 834, 1331, 1519, 1520, 1559, 1566, 1630, 1775, 1964, 2037, 2039, 2090

1-norm valueLAPACK 1331ScaLAPACK 1964

estimating the condition numberband storage 394packed storage 391



inverting the matrixLAPACK 1566packed storage 456ScaLAPACK 2090


matrix-vector productband storage 125packed storage 133

productblocked algorithm 1520, 2039LAPACK 1519, 1520ScaLAPACK 2037, 2039unblocked algorithm 1519

ScaLAPACK 1775scalar-matrix-matrix product 170solving systems of linear equations

band storage 129, 360packed storage 136, 357ScaLAPACK 1630

swapping adjacent diagonal blocks 1559triangular matrix in packed form


triangular system of equationssolving with scale factor


tridaigonal system of equations 1548tridiagonal matrix 331, 342, 368, 675, 2106

estimating the condition number 368

tridiagonal matrix (continued)solving systems of linear equations

ScaLAPACK 2106tridiagonal triangular factorization

band matrix 1910tridiagonal triangular system of equations 2076trigonometric transform

backward cosine 2558backward sine 2558backward staggered cosine 2558forward cosine 2558forward sine 2557forward staggered cosine 2558

Trigonometric Transform interface 2557, 2559, 2562, 2563, 2566, 2568, 2570, 2906

code examples 2906routines

?_backward_trig_transform 2568?_commit_trig_transform 2563?_forward_trig_transform 2566?_init_trig_transform 2562free_trig_transform 2570

Trigonometric Transforms interface 2561trust region algorithm 3000TT interface 2557TT routines 2561two matrices

QR factorizationLAPACK 635ScaLAPACK 1738

UUniform (continuous) 2326Uniform (discrete) 2365UniformBits 2367unitary matrix 643, 675, 774, 834, 1526, 1527, 1529,

1531, 1532, 1775, 1789, 2041, 2044, 2047, 2050, 2053

from LQ factorizationLAPACK 1529ScaLAPACK 2047

from QL factorizationLAPACK 1526, 1532ScaLAPACK 2041, 2053

from QR factorizationLAPACK 1527ScaLAPACK 2044

3052


unitary matrix (continued)from RQ factorization


ScaLAPACK 1775, 1789Unpack Functions 2203updating

rank-1general matrix 84Hermitian matrix 96, 103real symmetric matrix 114, 121

rank-1, conjugatedgeneral matrix 86

rank-1, unconjugatedgeneral matrix 88

rank-2Hermitian matrix 98, 105symmetric matrix 116, 123

rank-2kHermitian matrix 155symmetric matrix 166

rank-nHermitian matrix 152symmetric matrix 162

updating:rank-1Hermitian matrix


packed storage 114updating:rank-2



upper Hessenberg matrix 774, 834, 1313, 1775, 19581-norm value





ScaLAPACK 1775user time 1587

Vvector arguments 177, 2775, 2776

array dimension 2775default 2776examples 2775increment 2775length 2775matrix one-dimensional substructures 2775sparse vector 177

vector conjugation 1184, 1902vector indexing 2205vector mathematical functions 2207, 2208, 2210,

2211, 2213, 2214, 2215, 2216, 2218, 2221, 2222, 2224, 2226, 2227, 2229, 2231, 2232, 2234, 2235, 2237, 2239, 2240, 2242, 2244, 2245, 2247, 2249, 2250, 2252, 2253, 2255, 2256, 2257, 2258, 2259, 2261, 2262

complementary error function value 2252computing a rounded integer value and raisinginexact result exception 2261computing a rounded integer value in currentrounding mode 2259computing a truncated integer value 2262cosine 2227cube root 2214denary logarithm 2226division 2210error function value 2250exponential 2222four-quadrant arctangent 2239hyperbolic cosine 2240hyperbolic sine 2242hyperbolic tangent 2244inverse cosine 2234inverse cube root 2215inverse error function value 2253inverse hyperbolic cosine 2245inverse hyperbolic sine 2247inverse hyperbolic tangent 2249inverse sine 2235inverse square root 2213inverse tangent 2237inversion 2208natural logarithm 2224power 2216power (constant) 2218rounding to nearest integer value 2258rounding towards minus infinity 2255

3053

Index

vector mathematical functions (continued)rounding towards plus infinity 2256rounding towards zero 2257sine 2229sine and cosine 2231square root 2211square root of sum of squares 2221tangent 2232

Vector Mathematical Functions 2201vector multilication


vector pack function 2264vector statistics functions

Bernoulli 2369Beta 2362Binomial 2373Cauchy 2346CopyStream 2308CopyStreamState 2309DeleteStream 2307dNewAbstractStream 2301Exponential 2337Gamma 2358Gaussian 2329GaussianMV 2332Geometric 2371GetBrngProperties 2389GetNumRegBrngs 2322GetStreamStateBrng 2321Gumbel 2356Hypergeometric 2376iNewAbstractStream 2299Laplace 2340LeapfrogStream 2313LoadStreamF 2312Lognormal 2352NegBinomial 2383NewStream 2296NewStreamEx 2297Poisson 2378PoissonV 2381Rayleigh 2349RegisterBrng 2388SaveStreamF 2310SkipAheadStream 2317sNewAbstractStream 2304Uniform (continuous) 2326Uniform (discrete) 2365

vector statistics functions (continued)UniformBits 2367Weibull 2343

vector unpack function 2266vector-scalar product 69, 179

sparse vectors 179vectors

adding magnitudes of vector elements 50copying 53dot product

complex vectors 59complex vectors, conjugated 57real vectors 54

element with the largest absolute value 72element with the largest absolute value of real partand its index 1904element with the smallest absolute value 73Euclidean norm 60Givens rotation 63linear combination of vectors 51modified Givens transformation parameters 67rotation of points 61rotation of points in the modified plane 64sparse vectors 177sum of vectors 51swapping 71vector-scalar product 69

vmlFunctions Interface 2203Input Parameters 2204Output Parameters 2205

VML 2201VML functions

mathematical functionsAcos 2234Acosh 2245Asin 2235Asinh 2247Atan 2237Atan2 2239Atanh 2249Cbrt 2214Ceil 2256Cos 2227Cosh 2240Div 2210Erf 2250Erfc 2252ErfInv 2253

3054


VML functions (continued)mathematical functions (continued)

Exp 2222Floor 2255Hypot 2221Inv 2208InvCbrt 2215InvSqrt 2213Ln 2224Log10 2226Modf 2262NearbyInt 2259Pow 2216Powx 2218Rint 2261Round 2258Sin 2229SinCos 2231Sinh 2242Sqrt 2211Tan 2232Tanh 2244Trunc 2257

pack/unpack functionsPack 2264Unpack 2266

service functionsClearErrorCallBack 2280ClearErrStatus 2275GetErrorCallBack 2279GetErrStatus 2275GetMode 2272SetErrorCallBack 2276SetErrStatus 2273SetMode 2269

VML Mathematical Functions 2203VML Pack Functions 2203VML Pack/Unpack Functions 2263VML Service Functions 2269VSL Fortran header 2281VSL routines

advanced service subroutinesGetBrngProperties 2389RegisterBrng 2388

convolution/correlationCopyTask 2435DeleteTask 2433Exec 2423Exec1D 2425

VSL routines (continued)convolution/correlation (continued)

ExecX 2428ExecX1D 2431NewTask 2403NewTask1D 2406NewTaskX 2408NewTaskX1D 2411SetInternalPrecision 2417

generator subroutinesBernoulli 2369Beta 2362Binomial 2373Cauchy 2346Exponential 2337Gamma 2358Gaussian 2329GaussianMV 2332Geometric 2371Gumbel 2356Hypergeometric 2376Laplace 2340Lognormal 2352NegBinomial 2383Poisson 2378PoissonV 2381Rayleigh 2349Uniform (continuous) 2326Uniform (discrete) 2365UniformBits 2367Weibull 2343

service subroutinesCopyStream 2308CopyStreamState 2309DeleteStream 2307dNewAbstractStream 2301GetNumRegBrngs 2322GetStreamStateBrng 2321iNewAbstractStream 2299LeapfrogStream 2313LoadStreamF 2312NewStream 2296NewStreamEx 2297SaveStreamF 2310SkipAheadStream 2317sNewAbstractStream 2304

VSL routines:convolution/correlationSetInternalDecimation 2420SetMode 2415

3055

Index

VSL routines:convolution/correlation (continued)SetStart 2418

WWeibull 2343Wilkinson transform 2040

Xxerbla 2710xerbla, error reporting routine 45, 1588, 2206

Zzlag2c 1571

3056


Intel(R) Math Kernel Library Reference Manual

Documents

Transcript of Intel(R) Math Kernel Library Reference Manual